Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#2551 closed Enhancement (fixed)

when uploading to peers, prefetch local data from disk

Reported by: jch Owned by: charles
Priority: Low Milestone: 2.00
Component: libtransmission Version: 1.76
Severity: Minor Keywords: performance
Cc:

Description

The attached patch causes a block to be prefetched from disk as soon as a request is queued by the client. This wins twice: first of all, it makes it less likely that we block for a disk read (thus blocking all of Transmission, since we're not doing reads in a dedicated thread), and makes it more likely that the kernel will be able to coalesce reads.

In Hekate, I use a somewhat more complex strategy: I prefetch at request time, then I check for residency once, yield to other clients if not in core; if after yielding the block is still not in core, I use a dedicated thread. In my experiments, on a slow laptop and purged buffer cache, when seeding at 1MB/s the dedicated thread is used for 1 in 10 blocks.

This is ready to apply, except that you might want to put the right #ifdef around the call to posix_fadvise at inout.c:161.

--Juliusz

Attachments (10)

prefetch.patch (4.9 KB) - added by jch 11 years ago.
prefetch-base.patch (6.0 KB) - added by jch 11 years ago.
prefetch.2.patch (2.9 KB) - added by jch 11 years ago.
prefetch-verify.patch (441 bytes) - added by jch 11 years ago.
prefetch.3.patch (3.2 KB) - added by charles 11 years ago.
revision of jch's prefetch.2.patch, tweaked to handle r9494
prefetch-forgotten.patch (2.4 KB) - added by jch 11 years ago.
You forgot to prefetch when receiving a request, Charles
prefetch-no-sequential.patch (1.1 KB) - added by jch 11 years ago.
0001-Prefetch-when-receiving-request.patch (2.0 KB) - added by jch 11 years ago.
0002-Prefetch-when-verifying.patch (745 bytes) - added by jch 11 years ago.
0003-Don-t-force-sequential-readahead-when-opening-files.patch (1.5 KB) - added by jch 11 years ago.

Download all attachments as: .zip

Change History (38)

Changed 11 years ago by jch

comment:1 follow-up: Changed 11 years ago by charles

jch: this is a good idea and a good start, but the patch has a couple of issues:

  • The patch as it is will break Transmission on non-Linux systems.
  • A lot of time can pass between when the request arrives and when the block is sent -- the requests pile up deep when speed limits are turned on, for example. If posix_fadvise() gets too many of these calls, will the older ones start to page out? If so, we'll be doing more harm than good by causing two reads instead of one. Maybe in fillOutputBuffer(), when we write the next block to the output buffer, we could call tr_ioPrefetch() on the second block in the list. Or maybe that gives too little advance warning to the OS... are there Best Practices for the timing of when to call POSIX_FADV_WILLNEED?

comment:2 in reply to: ↑ 1 ; follow-up: Changed 11 years ago by jch

Replying to charles:

  • The patch as it is will break Transmission on non-Linux systems.

You offend me ;-)

I code for POSIX, not for Linux. This patch will break Transmission on systems that don't implement the 2001 edition of POSIX. Such systems should be fairly rare nowadays (it's been eight years, after all), and if there are any, we can simply #ifdef away the calls to posix_fadvise.

  • A lot of time can pass between when the request arrives and when the block is sent -- the requests pile up deep when speed limits are turned on, for example.

You're right, as usual. Give me a few days to think it over.

--Juliusz

comment:3 in reply to: ↑ 2 Changed 11 years ago by charles

Replying to jch:

You're right,

:)

as usual

*cough* *cough*

comment:4 Changed 11 years ago by jch

Okay, here's my second attempt.

This time, we maintain a bounded queue of prefetched blocks for each unchoked peer; the queue is normally 8 blocks deep at most, but it may rise up to 12 blocks in the case it would cause sequential reads to happen.

I've split it into two patches: prefetch-base.patch doesn't do anything, it just sets up to common infrastructure for this and #1521, while prefetch.patch is the actual prefetching code.

I'd be grateful if you could consider committing prefetch-base whether you actually commit the prefetching code or not, since it'll simplify the life of people dealing with #1521.

--Juliusz

Changed 11 years ago by jch

Changed 11 years ago by jch

comment:5 Changed 11 years ago by jch

Oh, one missed place: when verifying a completed piece.

(By the way, Charles, is the stackbuf hack really worth it? It causes you to read in non-page-aligned buffer; why not simply have a 16kB page-aligned buffer in each struct session, and use that?)

--Juliusz

Changed 11 years ago by jch

comment:6 Changed 11 years ago by charles

prefetch-base.patch committed in r9495

comment:7 Changed 11 years ago by charles

jch: here's a revision of the patch for you to proofread, to make sure I haven't broken too many things in the previous patch. I think there might have been a bug in prefetch.2.patch, where with the "next" variable it compared req1's offset+length against req2's offset+length to see if 12 should be used instead of 8...

would prefetch-verify.patch actually do any good? the prefetch call would come immediately before the tr_ioRead()...

Changed 11 years ago by charles

revision of jch's prefetch.2.patch, tweaked to handle r9494

comment:8 Changed 11 years ago by charles

xref: r9500, which is a follow-on to r9494

Changed 11 years ago by jch

You forgot to prefetch when receiving a request, Charles

comment:9 Changed 11 years ago by jch

would prefetch-verify.patch actually do any good?

Not clear either way. On the one hand, we use small reads to verify a piece, so giving a hint to the kernel that reading in the whole piece is beneficial might be a good idea. On the other hand, we're reading the piece sequentially, so there's little chance that the explicit prefetching will be more effective than the kernel's read-ahead.

What is clear is that it cannot possibly do any harm. Since we're going to read in the whole piece, we're going to pollute our cache with it whether we prefetch or not. So we might as well give a hint to the kernel.

All in all, I recommend applying it.

--Juliusz

comment:10 Changed 11 years ago by jch

When opening a file, we currently fadvise(SEQUENTIAL). Now that we have prefetching, we might be able to avoid this, which will cause less cache pressure.

OTOH, it might very well be the case that the prefetching happens too late, and increasing the kernel's readahead is still beneficial. Hence, I do not recommend applying the following patch unless there is some evidence that it doesn't make things significantly worse.

I do recommend applying it if it is found to make things only a little worse, since reducing cache pressure increases the performance of the whole system (e.g. it avoids swapping out the bloated web browser that I use).

--Juliusz

Changed 11 years ago by jch

comment:11 Changed 11 years ago by charles

You forgot to prefetch when receiving a request, Charles

Well if you're going to worry about little details like that... ;)

comment:12 Changed 11 years ago by charles

...looking over the patch, no I didn't forget. IMO prefetching when the request is received is still a bad idea, for the reasons I outlined earlier. Instead, prefetch.3.patch calls prefetchPieces() right after writing a block... it's a good place because

  1. We know the peer is unchoked at this point
  2. calling prefetchPieces() here will continue to pump the prefetch mechanism as we work through the request queue... calling prefetchPieces() when we receive the requests won't do that.
  3. better odds of the prefetched data being used in the short term

comment:13 Changed 11 years ago by charles

...yes I realize this takes longer to get to steady state, since the first block won't be prefetched at all. IMO that's a small price to pay for limiting the prefetches to only blocks that are likely to be sent in the short term.

comment:14 Changed 11 years ago by charles

  • Milestone changed from None Set to 1.80
  • Owner set to charles
  • Status changed from new to assigned
  • Summary changed from Better buffer cache management to when uploading to peers, prefetch local data data from disk

comment:15 Changed 11 years ago by charles

prefetch.3.patch applied to trunk for 1.80 by r9514.

I'll do some timing tests on the verify call and on no-sequential.

comment:16 Changed 11 years ago by jch

I'm not quite sure what we're disagreeing about -- the strategy or the implementation? If it's the implementation, that can be fixed; if it's the strategy, then it's tougher.

Notre stratégie est de nous battre à un contre cent, mais notre implémentation est de nous battre à cent contre un. (with apologies to Mao Tse Tung)

About the strategy: he invariant I'm trying to implement is the following:

there are always 8 to 12 prefetched blocks in flight for every unchoked peer

This means that we have at most some 300 kB in flight per unchoked peer; if there are 100 unchoked peers at a given time, that's 30 MB of buffer cache, which is reasonable enough. Note that this is clean cache, so the kernel can discard it if it's short of memory with no disk operation.

In practice, however, peers tend to pipeline fewer requests than that, so the amount of prefetching will be much less.

If you disagree with the parameters (8 and 12), this can easily be tuned.

Now if it's the implementation, then you need to prefetch at both places, or else the first request in a pipeline will cause a synchronous read and hence block the whole libevent thread.

  1. We know the peer is unchoked at this point

We only prefetch when allow is true, which implies that the peer is unchoked.

  1. calling prefetchPieces [when pushing a block] will continue to pump the prefetch mechanism...

Yep, you need to prefetch at *both* places.

  1. better odds of the prefetched data being used in the short term

That would be disagreement with the strategy outlined above, rather than the implementation, right?

--Juliusz

comment:17 Changed 11 years ago by charles

There are always 8 to 12 prefetched blocks in flight for every unchoked peer

This is a very good starting point, but my concern is unnecessary prefetches. The invariant might be revised this way:

There are 8 to 12 prefetched blocks for every peer we're uploading to

There are several ways that blocks requested by an unchoked peer may not be sent. The peer may send a cancel message, or disconnect from us, or we may rechoke them. prefetch.3.diff doesn't prevent these events from ever happening, but it makes their window of opportunity smaller by deferring the prefetch until shortly before it's needed.

comment:18 Changed 11 years ago by charles

  • Resolution set to fixed
  • Status changed from assigned to closed

r9596 libtransmission/peer-msgs.c: (trunk libT) #2551: "when uploading to peers, prefetch local data from disk" -- fix peer-msgs.c NULL pointer dereference introduced in r9514 for this ticket. Reported by Waldorf

comment:19 Changed 11 years ago by livings124

  • Summary changed from when uploading to peers, prefetch local data data from disk to when uploading to peers, prefetch local data from disk

comment:20 Changed 11 years ago by jch

  • Milestone changed from 1.80 to Sometime
  • Priority changed from Normal to Low
  • Resolution fixed deleted
  • Status changed from closed to reopened

Charles,

I hope it's okay for me to reopen this bug. I certainly don't think it's important enough to hold 1.80, so I'm setting the milestone to sometime and lowering the priority.

I still think that you should consider prefetch-forgotten. If we are unchoking massive numbers of peers that we have no intention of actually serving, then there's something wrong with our choking strategy, and not prefetching eagerly is just a workaround.

Let me know if you disagree.

--Juliusz

comment:21 Changed 11 years ago by charles

I disagree :)

Unchoking is only one scenario where a requested block might not get served. You also have disconnections and cancel messages. Even in cases where the block is served, prefetch-forgotten has no concept of time. In low bandwidth situations it can be 30-60 seconds before we get to that particular block. If the prefetch expires in that period, we've actually made things worse than before.

comment:22 Changed 11 years ago by charles

  • Milestone changed from Sometime to 1.80

I ran some timing tests this morning.. on my system, when seeding at full speed, IO reads accounted for:

  • 6.01% of the runtime with prefetching disabled (as in 1.76)
  • 4.62% of the runtime with prefetching between 8 and 12 blocks (as in 1.80b1)
  • 3.91% of the runtime with prefetching hardcoded at 12.

I'm going to change the prefetch code in peer-msgs.c to automatically prefetch 12 blocks per peer we're uploading to.

comment:23 Changed 11 years ago by charles

  • Resolution set to fixed
  • Status changed from reopened to closed

r9676 (trunk libT) #2551 "when uploading to peers, prefetch local data from disk" -- prefetching performance tweak described in http://trac.transmissionbt.com/ticket/2551#comment:22 -- instead of prefetching [8..12] blocks, always prefetch 12 blocks

comment:24 Changed 11 years ago by jch

  • Milestone changed from 1.80 to Sometime

Resubmitting the missing bits after refreshing them. Reopening patch, but marking as not for 1.80.

I'm still claiming that 0001 improves performance, and that 0003 minimises disk cache pollution without impairing performance.

--Juliusz

comment:25 Changed 11 years ago by jch

  • Resolution fixed deleted
  • Status changed from closed to reopened

Changed 11 years ago by jch

comment:26 Changed 11 years ago by charles

  • Milestone changed from Sometime to 2.00
  • Resolution set to fixed
  • Status changed from reopened to closed

Subject: [PATCH 1/3] Prefetch when receiving request.

I'm a little surprised at this, but the tests back you up on this. Prefetching like this *is* a performance win. Committed to trunk in r10553 for 2.00

Subject: [PATCH 2/3] Prefetch when verifying.

Committed to trunk in r10552 for 2.00

Subject: [PATCH 3/3] Don't force sequential readahead when opening files. This is no longer necessary now that we have proper prefetching.

The thing to notice is the readahead's being used in a function named tr_open_file_for_scanning(), which is used if and only if we're opening a file to read straight through it from beginning to end in a burst. It's not used by the parts of the code that prefetch.

comment:27 Changed 10 years ago by charles

  • Component changed from Transmission to libtransmission
  • Keywords performance added

comment:28 Changed 10 years ago by motumboe

How do I remove myself from the CC's? Thanks

Note: See TracTickets for help on using tickets.