Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#5517 closed Bug (fixed)

Don't create or add torrents with "../" at the beginning of the path or "/../" anywhere in the path

Reported by: UncleMeat Owned by: jordan
Priority: Normal Milestone: 2.90
Component: libtransmission Version: 2.82
Severity: Normal Keywords:
Cc: iouqcar@…

Description

(posted previously on the forum)

I'm running transmission-daemon v 2.52 on Debian Stable. While I realize this is a rather old version I've got good reason to believe the problem still exists in newer versions, see below. In addition someone else has reported the exact same issue to me with version 2.82 on ArchLinux?, which is why I've marked this ticket as version 2.82.

When adding certain torrents, transmission-daemon refuses and insteads prints the log message

[torrent name] Invalid metadata entry "path" (metainfo.c:587)"

After digging through the source code a little bit I traced the problem back to the path_is_suspicious function in libtransmission/metainfo.c, line 65 in current trunk.

The code appears to check for the string ../ in any file path, which I assume is intended to prevent nasty tricks with file paths referring to files outside the torrent's designated filetree.

Problem is, it also rejects any path such as My Torrent/Some Files Here.../My File.txt, which is a perfectly sane path. So it should really only check on either ../ at the beginning of a path, or /../ anywhere else in the path.

I hope the above provides enough information; if not I'd be happy to provide more, or test using a newer version.

Attachments (3)

5517-path-sanitizing.patch (2.2 KB) - added by mike.dld 7 years ago.
Patch to inspect each path component separately
5517-path-sanitizing-v2.patch (4.2 KB) - added by mike.dld 7 years ago.
Patch to inspect each path component separately (version 2)
5517-path-sanitizing-v3.patch (4.3 KB) - added by jordan 7 years ago.

Download all attachments as: .zip

Change History (75)

comment:1 Changed 7 years ago by e5g6s

  • Cc iouqcar@… added

Any status on this? Currently there's no way to use folders ending with "..." (Transmission itself will happily create a torrent with such folders, and then fail to open it).

Seems a relatively simple fix. Something like this, maybe:

--- metainfo.c.orig
+++ metainfo.c
@@ -66,7 +66,8 @@
 path_is_suspicious (const char * path)
 {
   return (path == NULL)
-      || (strstr (path, "../") != NULL);
+      || (!strncmp (path, "../", 3))
+      || (strstr (path, "/../") != NULL);
 }
 
 static bool

comment:2 Changed 7 years ago by jordan

  • Milestone changed from None Set to 2.83
  • Status changed from new to assigned

This is a good idea.

comment:3 Changed 7 years ago by jordan

  • Resolution set to fixed
  • Status changed from assigned to closed

Patch added to trunk in r14231.

comment:4 Changed 7 years ago by x190

  • Priority changed from Normal to High
  • Resolution fixed deleted
  • Severity changed from Normal to Critical
  • Status changed from closed to reopened

comment:5 Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:6 follow-up: Changed 7 years ago by mike.dld

Edited to remove the noise.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:7 Changed 7 years ago by x190

  • Summary changed from Error adding torrent with "../" in file path to Don't create or add torrents with "../" at the beginning of the path or "/../" anywhere in the path

comment:8 in reply to: ↑ 6 Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:9 Changed 7 years ago by mike.dld

Edited to remove the noise.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:10 Changed 7 years ago by cfpp2p

I might suggest something like:

static bool
path_is_suspicious( const char * path )
{
    bool isBadPath;
    isBadPath = ( ( path == NULL )
        || ( !strncmp( path, "../", 3 ) )
        || ( strstr( path, "/../" ) != NULL ) );
#ifdef SYS_DARWIN
    if   ( ( !strncmp( path, "..:", 3 ) )
        || ( strstr( path, ":..:" ) != NULL ) )
        isBadPath = true;
#endif
    return isBadPath;
}

since colons are not illegal for Linux. In Windows colon is supported by the API but not by the shell so this works there too.

comment:11 Changed 7 years ago by x190

Specially crafted torrents, such as just2dots, just1slash, just2slashes, just3slashes are bad news for Mac Client v2.82 r14220. Even with no data downloaded, they will do things like trash your download folder and/or create oddly named directories

(e.g. __ftvKib  or ..__sqtePF)

in your download location when selecting "Remove Data File". Transmission then goes into a hang state when a new torrent is added.

I am currently running the following code to avoid loading these EVIL torrents.

static bool
path_is_suspicious( const char * path )
{
  return ( path == NULL )
	 || ( !strncmp( path, ".", 1 ) )
	 || ( !strncmp( path, "/", 1 ) )
	 || ( strstr( path, ".." ) != NULL )
	 || ( strstr( path, "//" ) != NULL )

Test torrents are available , but I feel it would be unwise to attach them here.

Last edited 7 years ago by x190 (previous) (diff)

comment:12 follow-ups: Changed 7 years ago by jordan

x190, that version of path_is_suspicious() wouldn't allow the OP's use case.

comment:13 Changed 7 years ago by jordan

  • Priority changed from High to Normal
  • Severity changed from Critical to Normal

Reverting priority to normal/normal. This isn't a Critical ticket.

comment:14 in reply to: ↑ 12 Changed 7 years ago by x190

Replying to jordan:

x190, that version of path_is_suspicious() wouldn't allow the OP's use case.

".." can wreak a lot of havoc as noted, even without downloading and overwriting files. You need to decide which is more important, and for me, filesystem integrity wins without question.

Last edited 7 years ago by x190 (previous) (diff)

comment:15 follow-up: Changed 7 years ago by x190

---

Last edited 7 years ago by x190 (previous) (diff)

comment:16 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by mike.dld

Replying to x190:

Jordan, here is a simplified version that should get the job done and keep everybody happy.

This version breaks initial functionality of filtering out relative paths. For example, it returns true for "a/../../b".

comment:17 in reply to: ↑ 16 ; follow-up: Changed 7 years ago by x190

---

Last edited 7 years ago by x190 (previous) (diff)

comment:18 in reply to: ↑ 17 Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:19 in reply to: ↑ 12 ; follow-ups: Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:20 in reply to: ↑ 19 Changed 7 years ago by mike.dld

Replying to x190:

True ellipses are one character (… --- option-colon key) and won't be filtered.

You obviously can't expect each and every user to know about this character (not mention using it), no matter how much more correct it is from the ideal world point of view.

comment:21 in reply to: ↑ 19 Changed 7 years ago by e5g6s

Replying to x190:

Replying to jordan:

x190, that version of path_is_suspicious() wouldn't allow the OP's use case.

True ellipses are one character (… --- option-colon key) and won't be filtered.

Sure, but "..." is far more commonly used. And even if it weren't, it shouldn't be restricted when it's a perfectly valid path. There are torrents out there with paths including "..." (that's how I ended up here in the first place, and probably the same for the OP).

comment:22 Changed 7 years ago by cfpp2p

x190's patch filters out paths and files like "..thisFileIsOK" or "...thisDirectoryIsOK/file" plus other valid files.

With filesnames of "." or ".." or "/" (and one easy way to test is by carefully editing the metadata with a text or bencode editor) transmission has proven to hang when the offending torrent is deleted-with-data. Doesn't crash but does produce the "Is a directory" error as depicted by #1684 when downloading. Deletion of the wrong files and directories and creation of randomly named files and directories also occurs.

This patch retains r14231 functionality and additionally filters out all files:

ending with a "/" any file named "." or ".."

static bool
path_is_suspicious( const char * path )
{
    if( path == NULL ) return true;
	
    //do NOT allow backward traverse
    if( !strncmp( path, "../", 3 ) ) return true;
    if( ( strstr( path, "/../" ) != NULL ) ) return true;

    const char * endOfString = strrchr( path, '\0' );

    // - error for all below will be -->  Is a directory  <-- by OS

    // illegal - slash as final
    if( !strcmp( --endOfString, "/" ) ) return true;

    // check for filename is just one or two dots
    if( strlen( path ) > 2 ) {
        if( ( !strcmp( --endOfString, "/." ) ) ) return true;
        if( ( !strcmp( --endOfString, "/.." ) ) ) return true;
        return false;
    }
    if( strlen( path ) == 2 ) {
        if( ( !strcmp( path, ".." ) ) || ( !strcmp( path, "/." ) ) ) return true;
        return false;
    }
    if( ( strlen( path ) == 1 )
        && ( !strcmp( path, "." ) ) ) return true;

    return false;
}

This has been well tested on Linux and Windows. Testing with MAC v2.82 had no issues. The conversion relationship of ":" and "/" for MAC is not a problem.

Last edited 7 years ago by cfpp2p (previous) (diff)

comment:23 follow-up: Changed 7 years ago by cfpp2p

comment:22 patch tested with MAC v2.82 had no issues. The conversion relationship of ":" and "/" for MAC is not a problem.

comment:24 in reply to: ↑ 23 Changed 7 years ago by x190

Replying to cfpp2p:

comment:22 patch tested with MAC v2.82 had no issues. The conversion relationship of ":" and "/" for MAC is not a problem.

I agree. Your comment:22 patch appears to cover all potentially malicious or error-producing sequences.

See comment:43.

Last edited 7 years ago by x190 (previous) (diff)

comment:25 Changed 7 years ago by mike.dld

I'm thinking of slightly different approach. Instead of inspecting whole path, why don't we inspect each separate path component (we surely have path as either list of components or as just a single component) for whether it

  • is null
  • is empty
  • contains path separator (note that this may include not only forward, but also backward slash on Windows)
  • consists of just one dot
  • consists of just two dots

and call this component suspicious in case anything applies? This may make the code cleaner IMHO, but may as well not :). I'll supply the patch later if anyone's interested (and if there'll be no patch already).

Last edited 7 years ago by mike.dld (previous) (diff)

Changed 7 years ago by mike.dld

Patch to inspect each path component separately

comment:26 follow-up: Changed 7 years ago by cfpp2p

replying to mike.dld

why don't we inspect each separate path component

In the vast majority of cases the component would not be suspicions so why not just check the entire path once when done rather than several times, one check for each time a path component is derived.

contains path separator

just having a path separator within the component is not necessarily malicious. For that matter settings.json elements for download-dir, incomplete-dir and when using move data files we can specify these types of elements within the path.

i.e. "download-dir": "/share/hdd/data/16gb/music/./violin///classical", which works as expected

and "download-dir": /share/hdd/data/16gb/music/./violin///classical/..", which I don't think should be allowed since it ends up downloading to violin (as opposed to classical)"

When downloading and uploading torrents with metafiles that include path separators within the actual metadata bencoded path and/or name entry there is no malfunction unless it violates one of the rules in my comment:22 patch.

(note that this may include not only forward, but also backward slash on Windows)

yes, you're right so for solely Windows.

Additionally take note of tr_metainfoGetBasename()

char*
tr_metainfoGetBasename (const tr_info * inf)
{
  size_t i;
  const char * name = inf->originalName;
  const size_t name_len = strlen (name);
  char * ret = tr_strdup_printf ("%s.%16.16s", name, inf->hashString);

  for (i=0; i<name_len; ++i)
    if (ret[i] == '/')
      ret[i] = '_';

  return ret;
}

which currently sanitizes path delimiter and allowing for such, but not banishing 'name' as corrupt or invalid for containing a path separator. For Windows we need: if ((ret[i] == '/') || (ret[i] == '\\'))

Last edited 7 years ago by cfpp2p (previous) (diff)

comment:27 in reply to: ↑ 26 Changed 7 years ago by mike.dld

Replying to cfpp2p:

In the vast majority of cases the component would not be suspicions ...

I don't exactly understand why is that true? I think the patch I supplied solves all the same issues.

... so why not just check the entire path once when done rather than several times, one check for each time a path component is derived.

It makes code cleaner in my opinion (as I said above), allowing one to understand what's happening without even needing comments. I agree with you though, this would not be as performant as checking complete path in the end, but is this code being executed that frequently for us to care?

just having a path separator within the component is not necessarily malicious. For that matter settings.json elements for download-dir, incomplete-dir and when using move data files we can specify these types of elements within the path.

The code in question is only used in one place, for parsing metadata. Settings.json is not related, and is controlled by user more than content of the torrent.

When downloading and uploading torrents with metafiles that include path separators within the actual metadata bencoded path and/or name entry there is no malfunction unless it violates one of the rules in my comment:22 patch.

And once again I'm not sure how are my rules in comment:25 different?.. Please explain.


EDIT: Thinking of performance more, in the worst case (when path is perfectly valid), your patch would now (with additional checks on Windows) call 14 libc functions for each path. My patch would call 3 libc functions per each path component, which is as good as your patch with paths of up to 4 components :) Not going to analyze each libc function complexity, this is just a funny observation.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:28 follow-up: Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:29 in reply to: ↑ 28 ; follow-up: Changed 7 years ago by mike.dld

If I understand your concerns correctly, my path_component_is_suspicious is fine as it is. It doesn't filter out "..." in the beginning (or elsewhere), because the comparison to "." and ".." is for the whole string (strcmp) and not the part of it (strncmp).

Last edited 7 years ago by mike.dld (previous) (diff)

comment:30 in reply to: ↑ 29 Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:31 Changed 7 years ago by mike.dld

IMHO treating ".a" case as suspicious is not valid. But I could consider myself an advanced user, since I know that those files are only hidden by default, and I know ways to make them visible, so they aren't that suspicious to me.

comment:32 Changed 7 years ago by x190

I'd be happy enough if either comment:22 or comment:25 patch made it into trunk.

comment:33 Changed 7 years ago by mike.dld

Attaching a bit improved patch: I've moved root check out of getfile so that it's not being checked each time (the value is always the same anyway). I also couldn't help it but to fix a small memory leak in parseFiles in case of invalid paths.

EDIT: Added fix for tr_metainfoGetBasename (comment:26) too.

Last edited 7 years ago by mike.dld (previous) (diff)

Changed 7 years ago by mike.dld

Patch to inspect each path component separately (version 2)

comment:34 follow-ups: Changed 7 years ago by cfpp2p

Problem is that with using the current ideas for path_component_is_suspicious() the following is determined as suspicious. A/a/b/c/d/e/f/trailer.mkv looks totally un-suspicions to me. My patch at comment:22 tested fine for this one.

http://transmissionbt.net/5517-example-1.JPG

Please note that tr_metainfoGetBasename() allows for slashes and I don't see why be inconsistent with similar issues here and banish for a path structure composed of components presented in the above tested torrent.

'name' could also be like A/B/C and too will be suspicious (but shouldn't be) by the path_component_is_suspicious() whether in single or multi file mode.

tr_metainfoGetBasename() is only used for when creating .torrent and .resume filenames. Without the patch for of comment:26 'name' as for example A\B\C} give errors Unable to save resume file and Unable to save torrent file This is tr_metainfoGetBasename() only use of now.

Last edited 7 years ago by cfpp2p (previous) (diff)

comment:35 in reply to: ↑ 34 Changed 7 years ago by mike.dld

Could you please point me to at least one (or better more than one) real torrent with such structure, and/or at least one real torrent-making program which creates torrents with such structure? I'll be glad to reconsider.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:36 in reply to: ↑ 34 ; follow-up: Changed 7 years ago by rb07

Replying to cfpp2p:

Problem is that with using the current ideas for path_component_is_suspicious() the following is determined as suspicious. A/a/b/c/d/e/f/trailer.mkv looks totally un-suspicions to me. My patch at comment:22 tested fine for this one.

As per the standard (https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure) the path is fine, but the metadata is wrong (i.e. no path separators are allowed). Whatever you are using is just plain wrong.

I think this ticket has started to loose its objective, and is mixing issues. The main objective is to flag suspicious paths, suspicious in a security sense (not dot files, or any other nonsense).

I think Mike.dld is right, with perhaps some exagerations (an empty path component breaks the standard, its not suspicious).

The part about illegal paths perhaps should be addressed somewhere else, variant.c is probably the place where paths are built from the metadata dictionary, and that is where illegal paths should be flagged.

BTW I also think the patch should add proper messages to the log, so users, which really don't care about the details, can get an idea of why a certain metadata, or part of it, was rejected.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 7 years ago by x190

Last edited 7 years ago by x190 (previous) (diff)

comment:38 in reply to: ↑ 37 Changed 7 years ago by rb07

Replying to x190:

That's not what I was talking about, ".." is not a "dot file", its "a special directory entry". Ref: http://en.wikipedia.org/wiki/Dot-file, http://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory

Sorry for contributing (with this response) to the useless set of posts.

comment:39 Changed 7 years ago by x190

In the end, the goal is to improve security and avoid application hangs and other bad behavior. If some of my ideas or responses were incorrect, I don't have a problem acknowledging that. Testing with mike.dld's patch has yielded positive results for me.

comment:43 builds on his great ideas.

Last edited 7 years ago by x190 (previous) (diff)

comment:40 Changed 7 years ago by cfpp2p

Adding to the confusion and futility of this ticket but maybe aiding in the understanding of the standards https://wiki.theory.org/BitTorrentSpecification#Info_in_Multiple_File_Mode quoting the Info in Multiple File Mode section

Info in Multiple File Mode

For the case of the multi-file mode, the info dictionary contains
the following structure:

    name: the file path of the directory in which to store all the files.
          This is purely advisory. (string)
    files: a list of dictionaries, one for each file. Each dictionary in
           this list contains the following keys:
        length: length of the file in bytes (integer)
        md5sum: (optional) a 32-character hexadecimal string corresponding
                to the MD5 sum of the file. This is not used by BitTorrent
                at all, but it is included by some programs for greater
                compatibility.
        path: a list containing one or more string elements that together
              represent the path and filename. Each element in the list
              corresponds to either a directory name or (in the case of
              the final element) the filename. For example, a the file
              "dir1/dir2/file.ext" would consist of three string elements:
              "dir1", "dir2", and "file.ext". This is encoded as a bencoded
              list of strings such as l4:dir14:dir28:file.exte

If I'm reading correctly: name: the file path of the directory in which to store all the files. This is purely advisory. (string) It reads to me name is a path and paths do contain path separators. I still don't see why necessarily banish path separators in all cases. I still support my comment:22 patch's implementation as it does not violate the standards.

'name' could also be like A/B/C and too will be suspicious (but shouldn't be) by the path_component_is_suspicious() whether in single or multi file mode.

comment:41 Changed 7 years ago by mike.dld

I think the only authorative source of wording on this matter is in the original BEP3, which doesn't contain anything about possible values being "paths", but only "names", whether directory or file.

I've just had a small conversation with Bram Cohen and Arvid Norberg over email. I don't want to cite all of it here (but I will if you want me to), so I'll pass you the main idea:

  • It's been agreed on that those values stored in "name" or any "path" component are exactly names, whether directory or file. If the name contains characters having special meaning for a particular file system API, these have to be dealt with (escaped or transformed) in some way to make the name valid for that API. Same goes for cases where file system API rejects some specific names (as it is with names containing trailing spaces/dots on Windows).
  • It's been also agreed on that there could be names which could be considered suspicious/harmful (such as ".."; or "con", "lpt1", "prn", etc. on Windows). We currently reject to work with torrents containing these names (hence the ticket; although only ".." is currently handled), but it's also perfectly acceptable to escape/transform them to eliminate the possible threat.

To make this clear: in particular, if name contains a character (or sequence of characters) which is being treated as path separator by file system API in question, that character (or sequence of characters) should not actually be treated as such by a BitTorrent client, but instead escaped or otherwise transformed to be treated as part of file or directory name by that API. In other words, slashes inside "name" or "path" values are perfectly legal, but they are not path separators.

That said, I don't think my patch is entirely correct as it rejects paths we could've dealt with. On the other hand, I don't think current code is ideal either although at least it solves one issue on one platform. I'm ready to discuss possible improvements (here or somewhere else), and I know you guys may have ideas too, but I'd very much like to know jordan's or livings124's opinion on whether they care at all.

Last edited 7 years ago by mike.dld (previous) (diff)

comment:42 Changed 7 years ago by jordan

I like both of these patches and especially like the collaboration here. :)

IMO the bottom line is this function should sniff out paths that are a security risk. rb07's use case in comment:34 isn't a very likely path, but it's also not a security risk, so the function should let it through.

Of the suggestions so far, I have a mild preference for the patch in comment:22

comment:43 Changed 7 years ago by x190

---to view my 'locked-down' version of this patch, click on 'diff'---

Last edited 7 years ago by x190 (previous) (diff)

comment:44 Changed 7 years ago by mike.dld

Well, as I already noted, my patch could be ignored since it doesn't solve the issue in the right way. I could agree on cfpp2p's patch (comment:22 + comment:26), as it doesn't do any more harm than there already is, and even does resolve some issues on Windows.

All in all, continuation of this ticket should most likely take place in #4753.

comment:45 follow-ups: Changed 7 years ago by jordan

I sat down to patch comment:22 + comment:26 into metainfo.c and found myself coming around to Mike's approach of sniffing out the individual directory components because of how much it simplifies the various test cases for "." and ".." that are in the patch variation in comment:22.

The b/c/d torrent mentioned in comment:34 still seems mostly harmless. However, on balance I think it does merit being called "suspicious" because it both (a) goes against the spirit of the spec and (b) is not something that's generated by any known torrent software.

So instead I started to use Mike's 5517-path-sanitizing-v2.patch and wanted to tweak it a little to elminate some #ifdef'fing and conceptual duplication. Attached is a new draft, 5517-path-sanitizing-v3.patch.

Other than the b/c/d issue, is there any feedback/suggestions on this v3 patch?

Changed 7 years ago by jordan

comment:46 in reply to: ↑ 45 Changed 7 years ago by x190

Replying to jordan:

is there any feedback/suggestions on this v3 patch?

I agree with this approach regardless of the fact that taking this stance apparently caused the sun to reverse its course in the sky 3 months ago. :(

However, I would like to see a check for "~" added as it causes the same issues as "." and "..", at least on os x ("~" = home directory).

+ || (strcmp (component, "~") == 0
Last edited 7 years ago by x190 (previous) (diff)

comment:47 in reply to: ↑ 45 ; follow-up: Changed 7 years ago by mike.dld

You won't hear "no" from me :) But...

Just to try to prove myself wrong I made a little research (back to the times of comment:34) I'd like to share. I downloaded [all the available at the moment] 3,262,466 .torrent files (~99 GiB) from TPB (yes, this took a while) and inspected them. Here's the part of the results relevant to current discussion:

  • 62 .torrent files have backslash "\" in path component(s)
  • 14 .torrent files have slash "/" in path component(s)

With my patch, these 0.000023% of torrents have suspicious path components. If, with these stats in mind, you still want to use my patch, you're kindly welcome.

Example path components containing backslash (excluding possibly copyrighted material):

\.ogg
myspace.com\razorfistmetal.desktop
Nietzsche\Nietzsche - Ecce homo.pdf
SKINNER,B.F.\ -\ Sobre\ o\ Behaviorismo.pdf

Example path components containing slash (excluding possibly copyrighted material):

http://jupitershare.blogspot.com/mgb_ryder_skye_1000.mp4
/root/pa.zip
inbound/8.0-CURRENT-200801-i386-bootonly.iso
./CT2.3_PTS_152.zip
<script>alert('xss');</script>.txt
<a href='#' onClick='alert(42);'>foobar?!</a>

comment:48 Changed 7 years ago by jordan

I'm curious how you scraped all the metainfo off TPB, but maybe that's another story...

Several of the examples listed in comment:45 are items we'd want to filter out anyway. I suspect the ones containing HTML of being crafted by pranksters, and "/root/pa.zip" is bad as well.

comment:49 Changed 7 years ago by jordan

  • Resolution set to fixed
  • Status changed from reopened to closed

5517-path-sanitizing-v3.patch committed in r14264.

comment:50 Changed 6 years ago by x190

  • Milestone changed from 2.83 to None Set
  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:51 follow-up: Changed 6 years ago by x190

Apparently ruTorrent likes to create torrents with metainfo like "pathl0:12:r2r-1459.sfv" which results in an "Invalid data" error using the patch committed in r14264. To accommodate ruTorrent we should remove Line 79 "(*component == '\0')" from path_component_is_suspicious() in metainfo.c.

https://forum.transmissionbt.com/viewtopic.php?f=4&t=15912&p=68363#p68363

Last edited 6 years ago by x190 (previous) (diff)

comment:52 Changed 6 years ago by mike.dld

It is also true that they have a registered issue report for that matter: Zero length file entry create in file list when creating a torrent from directory/folder

comment:53 Changed 6 years ago by mike.dld

And here I am again with TPB statistics (I now have 3,495,213 torrent files) on empty path components (full stats):

Creator Count #1 #2 #3 #4
Unknown (no "created by" field) 91 16 73 2
18 (hdreactor.org) 1 1
Azureus (various versions, up to 4.7.1.2) 24 5 12 19
BitComet (various versions, up to 1.36) 161 1 160
BitLord (various versions, up to 1.1) 18 18 1
Blitz2012 (hdreactor.org) 1 1
(filmoff.net) 1 1
http://durvallskingdom.com/ 1 1 1
http://pornoshara.tv 4 4
Jahed 1 1
mktorrent 1.0 21 21
ruTorrent (PHP Class - Adrien Gibrat) 8701 26 8675
Transmission (various versions, up to 2.82 (14160)) 42 42
uTorrent (various versions, up to 3230) 74 72 2 2
why are you looking her? 1 1
www.youtonightband.com 1 1
Total: 9143 150 8791 195 23

This is about 0.00262% of all TPB torrent files, with 0.00249% created by (obviously broken) ruTorrent.

I have some more interesting stats on empty path components (exactly, where in path are they situated) but am in a bit hurry, so will post it later.

EDIT: Fixed old stats (removed false-positives), added more stats:

  1. paths consisting of a single empty component
  2. paths beginning with empty component
  3. paths ending with empty component
  4. paths containing empty component somewhere in the middle

Some numbers don't sum up as the cases intersect. From my POV, all the cases should be fixed and not tolerated. And as it turned out, in Transmission itself as well.

Last edited 6 years ago by mike.dld (previous) (diff)

comment:54 in reply to: ↑ 47 Changed 6 years ago by cfpp2p

Replying to mike.dld:

@mike.dld

Just to try to prove myself wrong I made a little research

there is another case showing up on the forums https://forum.transmissionbt.com/viewtopic.php?f=4&t=15912&p=68358#p68358

and maybe you can set your fancy scripts on this one to see what the percentages are.

You won't hear "no" from me :) But...

I still think as jordan started out with comment:45 that patch comment:22 + comment:26 into metainfo.c is the best way to go.

At any rate, we are now getting false positives with v2.83.

comment:55 follow-up: Changed 6 years ago by cfpp2p

https://forum.transmissionbt.com/viewtopic.php?f=4&t=15912&p=68358#p68358

Replying to mike.dld:

@mike.dld

Just to try to prove myself wrong I made a little research

and maybe you could set your fancy scripts on this one to see what the percentages are for this one.

You won't hear "no" from me :) But...

I still think as jordan started out with comment:45 that patch comment:22 + comment:26 into metainfo.c is the best way to go. My code does not flag this metadata structure invalid, which I agree is the correct result.

Replying to mike.dld:

It is also true that they have a registered issue report for that matter...

ruTorrent is a front-end for the popular Bittorrent client rTorrent.

Other clients, not inclusive to rTorrent, allow this structure, there will be these types of metadata in the wild now, and it doesn't look _is_suspicious. I believe transmission should not flag this type metadata corrupt or invalid and exclude itself from processing these types of torrents.

At any rate, we are now getting false positives with v2.83.

comment:56 Changed 6 years ago by mike.dld

Updated comment:53.

comment:57 in reply to: ↑ 55 Changed 6 years ago by mike.dld

Replying to cfpp2p:

and maybe you could set your fancy scripts on this one to see what the percentages are for this one.

See comment:53.

Let me state my position so that it's clear. The metadata currently in dispute is not valid, as there is no filesystem which allows for creating files or directories with empty names. If you read my comment:41 carefully, and BEP3 for that matter, you'll notice that "path" value consists of file/directory names. If there were some characters having special meaning we could've escaped, we would've done so (sometime in the future as continuation of this or some other issue), but there are none, so we leave those names as is and they are still invalid afterwards. The fact that people creating such torrents seem to depend on BitTorrent clients first concatenating path components and only then doing the cleanup doesn't make the metadata valid. Here're a few assumptions those people make:

  1. "/path/to//file" (empty components in the beginning/middle of path) is good path, since most (?) *NIX systems would ignore/collapse several path separators in a row (not true on Windows).
  2. "/path/to/file/" (empty components in the end of path) is good path (note here that it's a file path), because it would contain a slash in the end, so would lead to directory being created and not file.
  3. "http://example.com/" (empty components in the middle/end of path) is not exactly good path, but some BitTorrent clients display concatenated file paths in a list, and it would look awesome to display our URL there.
Last edited 6 years ago by mike.dld (previous) (diff)

comment:58 Changed 6 years ago by jordan

I've separated out the issue of those 42 invalid paths created by Transmission into its own ticket, bug #5714.

Even if the metainfo isn't strictly valid, I don't see any problem with silently ignoring empty components in metainfo's "path" array so long as there's at least one nonempty component in the array. Obviously it would be nice to handle .torrent files that were actually created by Transmission ;) but more importantly, ruTorrent's generated two orders of magnitude more .torrent files that have the same issue.

WRT allowing these torrents, I think it would be simpler would be to patch the existing code along the lines suggested by x190 in comment:51.

comment:59 follow-up: Changed 6 years ago by jordan

In r14293 I've added code to quietly accept (and ignore) empty components, so long as not /all/ of the components are empty.

I've also added unit tests to test these edge cases.

Comments welcome...

comment:60 follow-up: Changed 6 years ago by mike.dld

Just ignoring empty components would cause issues with case 3. We also need to completely ignore paths with empty component(s) at the end. Examples in comment:57 are taken from real torrents, I didn't make them up ;)

Last edited 6 years ago by mike.dld (previous) (diff)

comment:61 in reply to: ↑ 59 Changed 6 years ago by cfpp2p

Replying to jordan:

In r14293 I've added code to quietly accept (and ignore) empty components, so long as not /all/ of the components are empty.

Comments welcome...

I think this is the correct way to go. Thanks jordan :)

Replying to mike.dld :

And here I am again with TPB statistics

It's really great to have all this excellent detailed analysis magic!

plumsmooth on the forums gave all of us this recent great fun to begin with...

x190 supports the project and this issue like no one else, both here and on the forums.

Last edited 6 years ago by cfpp2p (previous) (diff)

comment:62 in reply to: ↑ 60 ; follow-up: Changed 6 years ago by x190

Replying to mike.dld:

Just ignoring empty components would cause issues with case 3. We also need to completely ignore paths with empty component(s) at the end. Examples in comment:57 are taken from real torrents, I didn't make them up ;)

I guess all of this assumes that is what the creator wants. In case 3, we assume the torrent creator wants a file, not a folder, and the empty component at the end is just a result of bad torrent creation code? I agree that these torrents are invalid, but apparently we are accomodating ruTorrent and BitComet? and others.

Last edited 6 years ago by x190 (previous) (diff)

comment:63 in reply to: ↑ 62 ; follow-up: Changed 6 years ago by mike.dld

Replying to x190:

In case 3, we assume the torrent creator wants a file, not a folder, and the empty component at the end is just a result of bad torrent creation code?

As I understand it now, after analysing several such torrents, BitComet (and BitLord) creates path ending with empty component when there is a directory with no files inside. This way, they are able to not only create empty files, but also empty directories. So it's not a bad torrent for BitComet, but I don't know how other BitTorrent clients handle this kind of data.

All Azureus and uTorrent case 3 torrents are there because of URL paths (see comment:57) and may be ignored.

comment:64 in reply to: ↑ 63 ; follow-up: Changed 6 years ago by x190

Replying to mike.dld:

Replying to x190:

In case 3, we assume the torrent creator wants a file, not a folder, and the empty component at the end is just a result of bad torrent creation code?

As I understand it now, after analysing several such torrents, BitComet (and BitLord) creates path ending with empty component when there is a directory with no files inside. This way, they are able to not only create empty files, but also empty directories. So it's not a bad torrent for BitComet, but I don't know how other BitTorrent clients handle this kind of data.

All Azureus and uTorrent case 3 torrents are there because of URL paths (see comment:57) and may be ignored.

You should be able to test this (path ending with empty component) in Transmission using the latest build. In this scenario, I see a file being listed, not a directory, however, this may not be a problem in itself? Length may need to be non-zero?? See: #5716.

On another topic, what are your thoughts regarding excluding "~" as a standalone filename? When I try to create such a file in TextEdit, I am warned that I will overwrite my home folder. Not good! I seem to be able to use "~" as a filename in the Finder without ill effects, though.

Last edited 6 years ago by x190 (previous) (diff)

comment:65 in reply to: ↑ 64 Changed 6 years ago by mike.dld

Replying to x190:

On another topic, what are your thoughts regarding excluding "~" as a standalone filename?

Tilde is usually being expanded by shell (not by filesystem API itself), unless it's escaped in one or another way; it is also most of the time being expanded by various system open/save dialogs for the sake of convenience. So basically it's not a "special" name compared to "." and "..". Since Transmission doesn't use shell to operate files and directories, it should be fine. As for user impact, I myself wouldn't be confused seeing this kind of entry, but as I said somewhere already, I can't be considered an ordinary user so may not have an objective opinion here.

comment:66 in reply to: ↑ 51 Changed 6 years ago by cfpp2p

This issue remains open, incomplete and in need of resolution on at least this count.

Replying to x190:

Apparently ruTorrent likes to create torrents with metainfo like "pathl0:12:r2r-1459.sfv" which results in an "Invalid data" error using the patch committed in r14264. To accommodate ruTorrent we should remove Line 79 "(*component == '\0')" from path_component_is_suspicious() in metainfo.c.

https://forum.transmissionbt.com/viewtopic.php?f=4&t=15912&p=68363#p68363

. .

. .

Another recent post indicates the issue remains a continuing alienating factor for transmission users:

https://forum.transmissionbt.com/viewtopic.php?f=2&t=16626

...MORE than 20,000 .torrents...

. .

http://computerfixpro.com/torrent-wont-load-transmission.JPG

And the issue appears not yet fixed by the ruTorrent people. https://github.com/Novik/ruTorrent/issues/983

So there might be an increasingly greater number of torrents with this format. I see no reason whatsoever to not accommodate these types of fully functional and harmless torrents. x190's patch (untested by me), my original patch set (well tested) or some yet unspecified patch should be implemented. In this way we do no harm, add happy transmission users and promote a better image of the project as a whole.

comment:67 follow-up: Changed 6 years ago by mike.dld

That's an exaggeration. "MORE than 20,000" was mentioned as a total number of torrents in the library, not number of torrents having this particular [mostly ruTorrent-induced] issue. AFAICT jordan has already resurrected support for those torrents (comment:59, r14293) by changing code to ignore empty path components, so it's not like nobody did nothing about it. The change wasn't released yet, that'a another story. I'm also wondering why was µTorrent mentioned in that post... did they actually introduce the same behavior as we?..

As for ruTorrent itself, I thought the issue was fixed long ago. I didn't check myself, but if you're saying it's not fixed to date then you probably have more information than I do.

comment:68 Changed 6 years ago by cfpp2p

(comment:59, r14293)

so sorry, I forgot. even I said it looked good comment:61

I apologize for this.

20,000 Ok, your reading skills we not skewed like mine :-(

I stand corrected, we've got a good fix here for the next release as far as I can tell.

comment:69 in reply to: ↑ 67 Changed 6 years ago by cfpp2p

  • Milestone changed from None Set to 2.90

Replying to mike.dld:

That's an exaggeration. "MORE than 20,000" ...

I'm also wondering why was µTorrent mentioned in that post... did they actually introduce the same behavior as we?..

Answers to these questions are clarified in a recent forum post: https://forum.transmissionbt.com/viewtopic.php?f=2&t=16626#p70007

"NON-standards compliant" .torrent comprised ANYTHING less than 25% of our complete archive

After reading dozens of "official" µTorrent support forum posts by their developers in regard to this issue, here are a handful of posts that I feel pretty sums up their stance on this matter...

comment:71 follow-up: Changed 6 years ago by mike.dld

  • Resolution set to fixed
  • Status changed from reopened to closed

I believe it's already there, see comment:59 (r14293).

comment:72 in reply to: ↑ 71 Changed 6 years ago by x190

I guess I meant 'let's not forget it this time'. It should have been in v2.84. Onward to v2.90 --- banish forum posts like this!!!!!!!!!!!!!

Note: See TracTickets for help on using tickets.