Opened 3 years ago

Last modified 3 years ago

#6067 new Bug

path_component_is_suspicious() allows 0-chars (0x00)

Reported by: cfpp2p Owned by: jordan
Priority: Normal Milestone: None Set
Component: libtransmission Version: 2.84
Severity: Normal Keywords:
Cc:

Description

In metadata.c getfile() if metadata path entries contain 0-chars it's not flagged as suspicious. It seems it might be better to flag it so or to substitute another character for the 0-char (strip_non_utf8() substitutes non utf8's with "?"s). path_component_is_suspicious() checks a string. We'd need to do any substitution before that to insure nothing suspicions passes through which follows any substituted 0-char(s).

Right now transmission ultimately uses a truncated directory structure consisting of everything before the first 0-char. When the metadata construct is multiply nested directory that final truncated path is used for the filename. All following directories and the final filename are completely ignored, which might not be the best solution.

I can verify that transmission downloads and seeds from the resulting truncated directory(s)/filename ignoring everything after the first 0x00.

Change History (5)

comment:1 follow-up: Changed 3 years ago by x190

Isn't metadata required by the protocol to be UTF-8? If the creating client can't get it right, we should just reject the entire torrent or, if not, why not?

comment:2 in reply to: ↑ 1 Changed 3 years ago by cfpp2p

Replying to x190:

Isn't metadata required by the protocol to be UTF-8? If the creating client can't get it right, we should just reject the entire torrent or, if not, why not?


There have been multiple attempts to allow torrents with non UTF-8 (see references below).
utils.c strip_non_utf8() simply replaces those non UTF-8 with ? (question mark -- 0x3F). This simplistic approach currently is leading to problems similar to old bug #3397. A better approach seems to be to allow ISO-8859-1 or 8859-15 https://en.wikipedia.org/wiki/ISO/IEC_8859-15#Differences_from_ISO-8859-1, Or following as ticket:4882#comment:11 mask the bits to produce ASCII printable characters http://www.ascii-code.com/ . In allowing ISO-8859-1 I've found torrents like that of #1675 to function correctly. Simply replacing with ?s resulted in filenames with the same length producing exact same filenames composed of sequenced ?s. Downloading then results in a problem similar to #3397.

Having torrent names and/or filenames composed of sequences of ?s is confusing and buggy. However, this can be remedied for better usability and the solving bugs at the same time. All without disallowing the torrent completely.

references:
#1634
#1675
r7656
r7657

#3397
r10963

https://en.wikipedia.org/wiki/ISO/IEC_8859-15#Differences_from_ISO-8859-1
http://man7.org/linux/man-pages/man3/iconv_open.3.html

ticket:6064#comment:14

ticket:4882#comment:11

comment:3 follow-up: Changed 3 years ago by x190

Can you produce a patch that will not leave us vulnerable to SFSDS or MMCS (Sudden FS disappearance syndrome or Massive memory corruption syndrome)?

comment:4 in reply to: ↑ 3 Changed 3 years ago by cfpp2p

Replying to x190:

Can you produce a patch that will not leave us vulnerable to SFSDS or MMCS (Sudden FS disappearance syndrome or Massive memory corruption syndrome)?

Way back in the dark ages r758 introduced the idea of Use UTF-8 encoded name and paths in torrent file if available. It's unclear to me why development and support for other corresponding nonstandard metainfo fields (i.e. codepage) wasn't pursued.
--> Many dictionary keys also have counterpart suffixed with .utf-8, and in few instances with .utf8, e.g. "name.utf-8".
An example of such an inconsistency is when we have name field beginning with a 0-char the set name field becomes the torrent filename (#542 r4437). However, if the 0-char is not the first character we truncate everything following this first 0-char and the name becomes the preceding characters.

Seems like this ticket has become a bit off topic. I'll leave it to the dev(s) to decide if or how to patch path_component_is_suspicious() for the processing of 0-chars.

comment:5 Changed 3 years ago by x190

But can I name my torrent "ä⁄çH®Œ÷‹RNt-ÅB{°'Nc®+¬î”,PµPˇ∞Ô¥l£hUNÑïß“xYfëufl$ö#8Ï U´}yPPÎAø5=Ç0ä–t´Ï+É[ÆTÀÚ.îrÚø™0wõ=q˜0≈˚‰ìåıíîkêö&¬ÜoT˜¬¬{l˘Yçë–≥Ósÿdó>y˛‰)Ãè∫…ì6DÎ" or "‹ÈƒL"˛◊@Ö<gl≈ËÙë≥-7;b≠Q§ÂªõlÖπ?,π4¯è‹ñõÉEŒ¶ötk™[NÕ›&áêÁ¯«!ó˚Á∑pY€PÓ›j∂È=ûÎËßà˜#æfi¥fl)Ÿó¥~öøtE&ƒ©Z¶`88ΩW¡∫¥•7táuÃˇEî.Îgsôâ∂òçÆJÚB\∂"? If I do, will my computer blow up or simply suffer from "SFSDS or MMCS"? I think I need some more "MMJ" (Medic?l M?ry J?ne).

Note: the preceding is \t-encoded (trac encoded). I wanted to post a bunch of "⚃⚃⚃⚃", only with 4 zeros, but they kept disappearing. Should I have cleaned them with UTF? Will this post self-destruct in 5-4-3-2-1... ? I('m)SO confused! *sigh*

Last edited 3 years ago by x190 (previous) (diff)
Note: See TracTickets for help on using tickets.