Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#6064 closed Bug (fixed)

Segmentation fault when opening a torrent file

Reported by: d33tah Owned by: mike.dld
Priority: High Milestone: 2.90
Component: Transmission Version: 2.84
Severity: Major Keywords:
Cc:

Description

Torrent file, base64-encoded:

ZDg6YW5ub3VuY2UzNTp1ZHA6Ly90cmFja2VyLm9w ZW5iaXR0b3JyZW50LmNvbTo4MDEzOmNyZWF0aW9u IGRhdGVpMTNlNDppbmZvZDY6bGVuZ3RoaTIwZTQ6 bmFtZTEwOvQAko9sZS50eHQxMjpwaWVjZSBsZW5n dGhpNjU1MzZlNjpwaWVjZXMyMDpcxeZSvg3m8ngF swRk/5sA9InwyTc6cHJpdmF0ZWkxZWVl

Found when fuzzing transmission-show with this seed file:

ZDg6YW5ub3VuY2UzNTp1ZHA6Ly90cmFja2VyLm9w ZW5iaXR0b3JyZW50LmNvbTo4MDEzOmNyZWF0aW9u IGRhdGVpMTMyNzA0OTgyN2U0OmluZm9kNjpsZW5n dGhpMjBlNDpuYW1lMTA6c2FtcGxlLnR4dDEyOnBp ZWNlIGxlbmd0aGk2NTUzNmU2OnBpZWNlczIwOlzF 5lK+DebyeAWzBGT/mwD0ifDJNzpwcml2YXRlaTFl ZWU=

Fuzzing was not complete, so there might be more crashes like this.

Attachments (1)

base64-decode-torrents.zip (782 bytes) - added by cfpp2p 7 years ago.
v2.84 r14309 base64-decode-torrents no faults for both add torrent or transmission-show.

Download all attachments as: .zip

Change History (17)

Changed 7 years ago by cfpp2p

v2.84 r14309 base64-decode-torrents no faults for both add torrent or transmission-show.

comment:1 Changed 7 years ago by cfpp2p

  • Version changed from 2.84 to 2.84+

comment:2 follow-up: Changed 7 years ago by d33tah

@cfpp2p: what's the status of this one?

comment:3 in reply to: ↑ 2 Changed 7 years ago by cfpp2p

Replying to d33tah:

what's the status of this one?

It's up to the developers.

comment:4 Changed 7 years ago by d33tah

Could I request the developer's opinion on this ticket?

comment:5 Changed 7 years ago by Robby

I think you did by creating this ticket. ;-)

comment:6 Changed 7 years ago by d33tah

@Robby: well, I found that there's a segmentation fault, but I'm not sure if there are any non-DoS security implications of this one. It would be nice to find out how severe this one actually is.

comment:7 Changed 7 years ago by mike.dld

Are you testing with assertions enabled? I'm only seeing an assertion hit (not segmentation fault), and nothing at all with assertions disabled (tested with Qt client).

Anyway, the issue seems to be in isLegalUTF8() (ConvertUTF.c) which looks outdated (original source ceased to exist, or at least I wasn't able to find it at http://www.unicode.org/Public/PROGRAMS/CVTUTF/). The test case has a string of length 10 which begins with f4 00 92 8f, which is not a valid UTF-8 sequence (valid sequences don't contain 0-chars, as all the trailing bytes should be between 0x80 and 0xBF), but isLegalUTF8() misses this fact, as it only checks that second byte is not greater than 0xBF, and then additionally (because first byte is 0xF4) not greater than 0x8F. Then assertion is made with tr_utf8_validate() that the sequence is legal, but second time -1 is passed instead of 10 as sequence length, which causes real length to be calculated with strlen() to be 1, and f4 is not a valid UTF-8 sequence on its own.

If not the different lengths applied to the same sequence, we might have never noticed this. Judging from the code, this is also not the only sequence to fool the check and pass the invalid data through.

comment:8 Changed 7 years ago by mike.dld

In those variants of isLegalUTF8() that I've found, the line (as it appears in Transmission's code)

    case 2: if ((a = (*--srcptr)) > 0xBF) return false;

is written as

    case 2: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;

So it must have been long fixed...

comment:9 Changed 7 years ago by mike.dld

  • Owner set to mike.dld
  • Status changed from new to assigned

comment:10 Changed 7 years ago by mike.dld

  • Milestone changed from None Set to 2.90
  • Resolution set to fixed
  • Status changed from assigned to closed
  • Version changed from 2.84+ to 2.84

Fixed in r14676, added test cases in r14677. Thanks!

comment:11 follow-up: Changed 7 years ago by d33tah

@mike.dld: I believe that transmission could use some more fuzzing, but I don't have the CPU power to do that. Would you like to help? I could explain the process, it's really simple.

comment:12 follow-up: Changed 7 years ago by cfpp2p

It's not necessary to use a variant of isLegalUTF8(). The problem with the assertion failure stemmed from the way tr_utf8clean() is called from metainfo.c
r13667 "refactor libtransmission's tr_benc class as tr_variant." changed the tr_utf8clean() calls from metainfo.c
tr_utf8clean( str, -1 ) to tr_utf8clean (str, len)
It was the original purpose of tr_utf8clean( str, -1 ) to flag a call of strlen() from tr_utf8clean() before that first tr_utf8_validate() call, every time. It meant that string length was unknown. strlen() computes the length of the string up to, but not including the first "0-char" (the terminating null character).
Calls from metainfo.c could look like this: tr_utf8clean (str, TR_BAD_SIZE) even though in this case it's just unknown size, not bad size.

char*
tr_utf8clean (const char * str, size_t max_len)
{
  char * ret;
  const char * end;

  if (max_len == TR_BAD_SIZE)
    max_len = strlen (str);

  if (tr_utf8_validate (str, max_len, &end))
    ret = tr_strndup (str, max_len);
  else
    ret = to_utf8 (str, max_len);

  assert (tr_utf8_validate (ret, TR_BAD_SIZE, NULL));
  return ret;
}

This way the first call to tr_utf8_validate() properly fails. Then we will get a call of to_utf8(), as we should, and have strip_non_utf8() replace the invalid utf8 with "?" (question mark(s)).

Using tr_utf8clean (str, len) returns len derived from tr_variantDictFindStr()->tr_variantGetStr() which len is the raw derived element length of struct tr_variant_string, not necessarily that of a properly null terminated string.

By using tr_utf8clean (str, TR_BAD_SIZE), which flags a call to strlen(), the failed assertion goes away without using any variation of isLegalUTF8().

comment:13 in reply to: ↑ 12 Changed 7 years ago by mike.dld

Replying to cfpp2p:

Calls from metainfo.c could look like this: tr_utf8clean (str, TR_BAD_SIZE) even though in this case it's just unknown size, not bad size.

No they cannot, because there is no notion of "null-terminated string" in bencoded data, thus length should always be provided explicitly.

By using tr_utf8clean (str, TR_BAD_SIZE), which flags a call to strlen(), the failed assertion goes away without using any variation of isLegalUTF8().

And now you're proposing to leave the evident defect as it is?.. :)

comment:14 follow-up: Changed 7 years ago by cfpp2p

Prior to r14619 strip_non_utf8() used evbuffer_add() to place char zero = '\0' at the end of the string. With r14619 evbuffer_add (buf, &zero, 1) was removed.

utils.c 14613

static char*
strip_non_utf8 (const char * in, size_t inlen)
{
  const char * end;
  const char zero = '\0';
  struct evbuffer * buf = evbuffer_new ();

  while (!tr_utf8_validate (in, inlen, &end))
    {
      const int good_len = end - in;

      evbuffer_add (buf, in, good_len);
      inlen -= (good_len + 1);
      in += (good_len + 1);
      evbuffer_add (buf, "?", 1);
    }

  evbuffer_add (buf, in, inlen);
  evbuffer_add (buf, &zero, 1);
  return evbuffer_free_to_str (buf);
}

I'm proposing that we take a very close look at where the problem(s) might reside to do the best we can to resolve them thoughtfully and to use some more fuzzing.

other:
http://unicode.org/forum/viewtopic.php?f=9&t=90
http://site.icu-project.org/

comment:15 in reply to: ↑ 14 Changed 7 years ago by mike.dld

Replying to cfpp2p:

Prior to r14619 strip_non_utf8() used evbuffer_add() to place char zero = '\0' at the end of the string. With r14619 evbuffer_add (buf, &zero, 1) was removed.

Correct, because evbuffer_free_to_str() called inside of strip_non_utf8() guarantees that the result will be null-terminated, so appending one more null character doesn't make any sense.

I'm proposing that we take a very close look at where the problem(s) might reside to do the best we can to resolve them thoughtfully and to use some more fuzzing.

other:
http://unicode.org/forum/viewtopic.php?f=9&t=90
http://site.icu-project.org/

I don't think using ICU (which is a full-blown Unicode library) for our simple needs would be justified. And I never told I'm against fuzzing ;)

comment:16 in reply to: ↑ 11 Changed 7 years ago by mike.dld

Replying to d33tah:

@mike.dld: I believe that transmission could use some more fuzzing, but I don't have the CPU power to do that. Would you like to help? I could explain the process, it's really simple.

Sure, shoot. Not here though, #6019 would be a better place.

Note: See TracTickets for help on using tickets.