Opened 6 years ago

Last modified 6 years ago

#5890 reopened Bug

"deleteLocalData" makes assumptions about filesystem; infinite-loops if wrong.

Reported by: thezeusjuice Owned by: jordan
Priority: Normal Milestone: None Set
Component: libtransmission Version: 2.84
Severity: Normal Keywords: Remove, Trash, NFS
Cc:

Description

I noticed this when using an NFS mount as a downloads location, and then tried to "Trash Data & Remove From List" a completed torrent. Traced it to here:

https://trac.transmissionbt.com/browser/trunk/libtransmission/torrent.c#L2949

The function assumes that all filesystems report a "." and ".." entry when a directory is read. Apparently, in some cases this is not true (I tried "ls -a" in the command-line and "." and ".." don't show up there either, when inside the mounted NFS folder.)

Unfortunately, I have −273.15 °C (absolute-zero) knowledge of C so I can't supply a patch or even a suggestion. Just thought you should know.

Change History (9)

comment:1 Changed 6 years ago by mike.dld

The code you're pointing to does not search for "." and ".." entries, instead it skips those entries if they are present.

There was a commit after 2.84 (r14344 by jordan) labeled "fix infinite loop in deleteLocalData()". If you could try out currently nightly and report whether it still fails for you that would be great.

comment:2 Changed 6 years ago by cfpp2p

I have experienced the delete/infinite-loop problem too. Although I can't remember the exact details, I successfully used patch as shown specifically by line 3003, (additionally 3428 and 3467) https://trac.transmissionbt.com/attachment/ticket/5825/userClientSentBadPathV2.diff and it fixed it for my situation. See point 5) #5825. As I said I don't remember exact details so I haven't tested r14344

Last edited 6 years ago by cfpp2p (previous) (diff)

comment:3 Changed 6 years ago by thezeusjuice

  • Resolution set to invalid
  • Status changed from new to closed

After some difficulty I was able to build r14459, but it was still going into an infinite loop. I forced myself to learn a little C, and started copy-pasting "fprintf" everywhere. After recompiling and then running in foreground mode, I've reached the conclusion that for some reason the line:

while ((name = tr_sys_dir_read_name (odir, NULL)) != NULL)

keeps producing the same "name" each time, and since it never reaches the end of the directory listings, it loops infinitely. tr_sys_dir_read_name() is just a simple wrapper around readdir(). So it seems the bug is probably something weird going on in the NFS server (maybe the extra long folder names or spaces in the names are confusing it; I'll look into it tomorrow). Sorry for the trouble! and thank you very much for all the great work you've done on Transmission.

comment:4 Changed 6 years ago by mike.dld

I'd like to get more details on this, so please keep in touch.

comment:5 Changed 6 years ago by thezeusjuice

Well, I whittled down Transmission into a single .c file that reproduces the problem (available here: http://pastebin.com/s9HqTzdS ).

Also, I just noticed that this entire time I never mentioned which NFS server I was using: Specifically, it's a fork of UNFS3, although I've since also replicated the bug with the original UNFS3 (which *does* include "." and ".." entries in folders, so that's definitely not related at all).

Anywho, I've noticed that what happens is that after deleteLocalData() iterates through the folder's contents and calls remove() on each item, instead of receiving NULL from tr_sys_dir_read_name() to signify the end it actually receives the first item once again and thus infinite-loops. However, if the call to remove() is commented out, it does receive the expected NULL and doesn't infinite-loop.

So, for some reason which I have yet to elucidate, the calls to remove() are causing the NFS server to (instead of returning NULL or whatever) restart iteration at the first directory-item, when transmission tries to read past the last directory-item.

I've taken a look at the UNFS3 server code, but it's way much beyond me.

comment:6 Changed 6 years ago by thezeusjuice

Ok, so UNFS3 has a file called readdir.c (http://pastebin.com/a9gArkny). Line 101 seemed kind of suspicious, so commented it out and rebuilt. No more infinte-loop! (as far as I can tell). Don't know why that line was being run only when remove() was being called vs. not being run when remove() wasn't being called. Also have no idea why it was there to begin with (I would think the most appropriate reply to a non-sense client request would be an error, not to try and guess what they might have wanted). Anywho, it seems to be working now (I've tried a few torrents, and had no problems remove&trash-ing them). I guess I'll shoot off a bug report to the UNFS3 people.

So, case closed!

comment:7 Changed 6 years ago by cfpp2p

It may be that line 101 of readdir.c UNFS3 is wrong but I still think that transmission shouldn't allow an infinite depth of sub-directories and limit that number of nested directories to say something like 32,000 (which although still ridiculously high for transmissions purposes I believe) would stop these kinds of errors. I don't like while loops termination strictly dependent on an application or data outside of transmission source.

ref. http://en.wikipedia.org/wiki/Ext4 "Increasing the 32,000 subdirectory limit

In ext3 a directory can have at most 32,000 subdirectories. Ext4 allows an unlimited number of subdirectories.[13] To allow for larger directories and continued performance, ext4 turns on HTree indexes (a specialized version of a B-tree) by default. This feature is implemented in Linux 2.6.23. In ext3 HTrees can be used by enabling the dir_index feature."

comment:8 Changed 6 years ago by cfpp2p

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:9 Changed 6 years ago by thezeusjuice

Just want to chime in that an upper limit on the number of entries retrieved per directory would be useful as well. For example, the original bug I reported gave the impression of an infinite number of file entries for a single directory. In other words there was no nesting or increasing path depth going on, but it was still locking up Transmission.

Note: See TracTickets for help on using tickets.