Opened 7 years ago

Closed 5 years ago

#5663 closed Bug (fixed)

watchdir event should be edge triggered, not polled.

Reported by: missionsix Owned by: mike.dld
Priority: Normal Milestone: 2.90
Component: Daemon Version: 2.82
Severity: Normal Keywords: daemon libevent watchdir patch
Cc:

Description

Problem:


The current version of the watchdir redirectory code as a number of pitfalls, including:

  • No error checking, this is especially painful if you run out of inotify descriptors, or the directory does exist (typo in config!?)
  • The main loop used to poll every 1 second on the inotify descriptor, or perform a directory scan on other systems (ouch!)
  • Does not take advantage of kqueue on *BSD systems (including osx!)
  • Poorly structured #ifdef code (hard to read)

Solution:


Correctly use the systems event notification libraries (inotify, kevent) where applicable and tie into the daemon event loop. For systems that support it, this reduces spurious wake-ups and directory scans, which eat into system resources, especially on large directories.

The proposed patch is based on my daemon event loop changes which I've attached to that enhancement bug:

https://trac.transmissionbt.com/ticket/5660

Testing done: ============

The following scenarios were tested on these platforms:

Mac OSX 10.9 (kqueue) FreeBSD (kqueue) Ubuntu 10.04 (inotify) Ubuntu 10.04 (generic)

  • Add a .torrent file into watchdir
  • Add an .mp4 file into watchdir
  • With watchdir containing .torrent file, start daemon

Attachments (4)

watchdir-patch.diff (24.6 KB) - added by missionsix 7 years ago.
This patch adds platform specific watchdir code
daemon-event-loop-v2.diff (4.5 KB) - added by missionsix 7 years ago.
daemon event loop patch from ticket #5660
wathdir-patch-v2.diff (27.7 KB) - added by missionsix 6 years ago.
Version 2 of the patch. Adds features
5663-watchdir-v3.patch (63.5 KB) - added by mike.dld 6 years ago.

Download all attachments as: .zip

Change History (26)

Changed 7 years ago by missionsix

This patch adds platform specific watchdir code

Changed 7 years ago by missionsix

daemon event loop patch from ticket #5660

comment:1 Changed 7 years ago by jordan

  • Milestone changed from None Set to 2.83

comment:2 Changed 7 years ago by jordan

missionsix, could you attach an updated watchdir-patch.diff that's synced to trunk?

comment:3 Changed 7 years ago by mike.dld

Issues I see with current (v1) patch version:

  • watchdir_init() fail is treated as critical. It certainly deserves a warning message in log, but I'm not sure if it should prevent daemon from starting (see also next point).
  • If inotify/kqueue initialization fails, I'd like generic implementation to be used as a fallback.
  • Current generic code keeps a list of previously added torrents to not add same torrents over and over again. Note that deleting added torrents is an option, so if it's turned off your new generic implementation would do a lot of unnecessary work each 10 (by default) seconds. Same applies to kqueue implementation which also scans directory on each event.
  • Common complain from users is that Transmission fails to add torrents because it tries to parse .torrent file too soon (before another program which creates the file finishes writing it or changes ownership or access mode). I'd like to see this addressed as well with at least one retry within next N seconds in case metadata parsing failed.
  • In watchdir_init() and watchdir_teardown(), you tend to check for descriptor validity in different ways; be consistent (I prefer != -1 over >= 0 and < 0).
  • Inotify watchdir_init() returns 1 on fail, generic and kqueue return -1; be consistent.
  • Requirement to call watchdir_teardown() even if watchdir_init() failed seems odd, I'd prefer for watchdir_init() to clean up internally right away if it fails.
  • In inotify implementation,
    • in watchdir_event_cb(), ev->len includes terminating null byte, so you should not add 1
    • in watchdir_event_cb(), move nameoffset declaration after ev and use ev to calculate the offset instead of magic involving NULL-pointer casting; or better yet, just use &ev->name (or even buf, since once you got the length you need contents of inotify_event no more)
    • in watchdir_event_cb(), I'd like to see sanity check for whether ev->len would fit into remaining buffer space
    • you rely on libevent to read enough data at once to accomodate at least one inotify record; not sure if this could be guaranteed.

There could be more to it, but I'm tired at the moment to look at it further.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:4 Changed 7 years ago by mike.dld

Almost forgot to mention. Kqueue is actually being used now in Mac OS client (via 3rd-party VDKQueue class, see VDKQueue:receivedNotification:forPath and checkAutoImportDirectory in Controller.m). Maybe we could use the opportunity and leave just one implementation, removing VDKQueue.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:5 Changed 7 years ago by mike.dld

To correct myself, added torrent is either deleted or renamed with ".added" suffix, so it won't be found on next scan (filtering by ".torrent" suffix). However, if the file metadata parsing failed, none of this happens and with proposed patch

  • generic code will find that file on next scan (in 10 seconds by default) and try adding it once again
  • inotify code will not do anything about that file since no more notifications about it would be generated
  • kqueue code will find that file on next scan (upon next "add" filesystem event in watch directory).

I'd still like to see some kind of explicit retry logic.

comment:6 Changed 7 years ago by missionsix

Thanks Mike, i'll address those concerns soon. And I agree with using a single watchdir mechanism, maybe this needs to be moved to libtransmission?

comment:7 follow-up: Changed 7 years ago by jordan

missionsix, any news on this patch?

comment:8 in reply to: ↑ 7 Changed 7 years ago by missionsix

Replying to jordan:

missionsix, any news on this patch?

I'll find some time this weekend. I've got a partial update, but got distracted with other things.

comment:9 Changed 7 years ago by jordan

  • Milestone changed from 2.83 to 2.90

comment:10 Changed 6 years ago by mike.dld

missionsix, please note I'm merely a reviewer and you're not obligated to fix all the mentioned issues at once. I know finding time for free software coding could be an issue, so if you feel like you have some good (not necessarily perfect) code to show — do so, and we'll see whether someone else could assist you on improving it. As an example, getting rid of duplicate kqueue implementations and adding retries could easily be separated into other tickets.

comment:11 Changed 6 years ago by livings124

ping missionsix

comment:12 Changed 6 years ago by mike.dld

@missionsix, another friendly ping ;)

Changed 6 years ago by missionsix

Version 2 of the patch. Adds features

comment:13 Changed 6 years ago by missionsix

v2 adds

  • kqueue/inotify failback to generic scan
  • 3 attempts at parsing the file from onFileAdded, with a notice on failure
  • review feedback addressed, style cleanup
  • other misc stuff

comment:14 Changed 6 years ago by mike.dld

Review for v2 patch:

  • Use global variables as little as possible. Original onFileAdded accepted session argument, you removed it and made use of global mySession instead. Different original watch directory callbacks used dtr_watchdir* as context, you pass DIR* there which leads to odd code which uses context argument to scan the directory, but global gWatchData.watchdir when invoking callback. If/when we move this code into libtransmission it should be flexible enough to handle different sessions, multiple watch directories etc.
  • Fallback to generic watcher is performed by each non-generic one. This is error-prone and leads to code duplication. I was thinking of single fallback code in one place, something like "if we could use inotify, try initializing it; if that failed and we could use kqueue, try initializing it; if that failed, try initializing generic watcher; if that failed, terminate".
  • Same goes for retry logic, it is now duplicated. It would be better to at least factor it out to a single function.
  • It is odd that you always call watchdir_generic_teardown in watchdir_teardown but call to watchdir_generic_init is not always made in watchdir_init.
  • Do not use DIR* and related functions at all. There are now new tr_sys_dir_* functions (declared in file.h) which take care of path encoding internally. If you miss some function (e.g. rewinddir), I could assist in adding it. Moreover, in watchdir_scan_generic:
    • readdir_r may not be present on all systems, but
    • even if it is, using it is meaningless since you can guarantee that dir is only used from one thread here, but
    • even if you still want to use it, struct dirent de; is not a valid output buffer for that function as you don't account for variable (and platform-dependent) d_name size.
  • Do not use usleep. There is tr_wait_msec function (declared in utils.h).
  • Original code used to tr_strdup the directory path, I suggest you do the same.
  • In watchdir_event_cb (watchdir-inotify.c),
    • loc may point outside the buffer: loc += nread vs. loc = ev->name + nread.
    • still no sanity check for whether ev->len would fit the remainder of buf.
  • In watchdir_inotify_init,
    • initial scan failure should probably be a warning, not critical error.
    • no error check for evtimer_add.
    • redundant ret = -1; lines.
    • I don't see a need in special ENOSPC treatment.
  • In watchdir_kevent_cb,
    • I'm not sure whether disabling and re-enabling events could lead to missing notifications, did you test this? It may be better to use mutex to prevent concurrent scans.
    • ctx argument is marked as UNUSED while being used.
  • In watchdir_kqueue_init,
    • kqueue, open, and kevent are documented to return -1 on error, not any < 0 value (cleanup checks are correct nonetheless).
  • Since all init/teardown functions return either 0 or -1, I don't see a need in ret variable as when success execution branch reaches cleanup: label you could just return 0 right away.
  • Redundant return; statements at the end of function here and there.
Last edited 6 years ago by mike.dld (previous) (diff)

comment:15 Changed 6 years ago by mike.dld

Just another note on watchdir_event_cb (watchdir-inotify.c). I wouldn't do nread += bufferevent_read(...) in the loop, as bufferevent_read may return -1 on error and you miss this error check currently. Error check is also missing from outer loop with nread = bufferevent_read, as != 0 could still mean == -1.

comment:16 Changed 6 years ago by mike.dld

Any news, @missionsix? Need a hand?

comment:17 Changed 6 years ago by mike.dld

Closed #4899 as duplicate of this ticket.

Changed 6 years ago by mike.dld

comment:18 Changed 6 years ago by mike.dld

Attached is v3 of the patch with all the comments I had addressed + some unlisted bugs fixed. Also features:

  • move to libtransmission,
  • implementation for Windows,
  • unit tests,
  • non-blocking timer-based retries,
  • support for multiple watch directories.

Known issues (to be addressed before merge):

  • lack of comments,
  • bad error messages.

Would be nice to have in the future:

  • better support for multiple watch directories, i.e.
    • single kqueue for all directories on BSD/Mac,
    • single inotify for all directories on GNU/Linux,
    • single thread for all directories on Windows,
    • single timer for all directories in generic implementation,
  • use tr_watchdir_t in clients (i.e. not daemon).

Thanks @missionsix for the v1 and v2 which've been a good basis for this one. Hope we can improve on v3 further.

Last edited 6 years ago by mike.dld (previous) (diff)

comment:19 Changed 6 years ago by missionsix

@mike.did

Looking good. Sorry I couldn't address your feedback sooner, I did put some work on the changes but I couldn't find time to finish the patch. Thank for polishing it and getting it ready to merge, it's much appreciated.

comment:20 Changed 5 years ago by mike.dld

  • Keywords patch added

comment:21 Changed 5 years ago by mike.dld

  • Owner set to mike.dld
  • Status changed from new to assigned

comment:22 Changed 5 years ago by mike.dld

  • Resolution set to fixed
  • Status changed from assigned to closed

Committed as r14651+r14652. Thanks again, @missionsix.

Note: See TracTickets for help on using tickets.