Opened 8 years ago

Closed 8 years ago

#5151 closed Bug (worksforme)

Stalled/Stuck Torrents - Index Offset Error, Invalid argument, tr_fdFileCheckout and Couldn't truncate

Reported by: helloadam Owned by:
Priority: Normal Milestone: None Set
Component: Daemon Version: 2.73
Severity: Normal Keywords: index, offset, length, tr_fdFileCheckout, Invalid argument, inout
Cc:

Description

The Setup

Client Server model where server distributes files to all clients. Server is running Fedora Core 17, x86_64 w/ Transmission 2.73 compiled from Source. Clients are standalone servers running any number of bittorrent clients (transmission and rtorrent).

The Issue

Upon creating a torrent (using http://mktorrent.sourceforge.net/) and uploading to our internel tracker, the server (initial seeder) stalls/stucks on that specific torrent. Example: Tue-Nov-27.001.RAW.CID-738193281039340REGH

Our other nodes (clients) may get anywhere from 0% to 90% of the files before the server stalls/stuck on that specific torrent. All other torrents still work.

Attempted Solutions, Fixes and Strange Behaviors

When a torrent gets stuck/stalled (example: Tue-Nov-27.001.RAW.CID-738193281039340REGH) pausing the torrent and starting it up does not fix it. Announcing more peers does not fix it. Rehasing / file check does not fix it (reports 100% okay).

Some strange behaviors is the fact that both the server and the client can see each other in the peers list. The client sees the server, and it has 100% of the file. Client sees the peer status of DEI while Server sees the same with peer status of UE of each other. The "down" column is empty on both client and server.

Logs

Here are some snippets that may be relevant. Or at least helpful in having someone point me in the right direction.

Nov 27 20:12:58 node4 transmission-daemon[621]: [20:12:58.676] Tue-Nov-27.001.RAW.CID-738193281039340REGH index 1634890786 offset 1919249251 length 1952543827 err 1

Nov 27 20:25:53 node4 transmission-daemon[621]: [20:25:53.564] Couldn't truncate "/home/glftpd/torrents/.downloads/Tue-Nov-27.001.RAW.CID-738193281039340REGH/Sample/0001.avi": Invalid argument (fdlimit.c:396)

Nov 27 20:26:27 node4 transmission-daemon[621]: [20:26:27.588] Tue-Nov-27.001.RAW.CID-738193281039340REGH index 0 offset 1361900352 length 32647 err 3

Nov 27 20:26:27 node4 transmission-daemon[621]: [20:26:27.588] Tue-Nov-27.001.RAW.CID-738193281039340REGH tr_fdFileCheckout failed for "/home/glftpd/torrents/.downloads/Tue-Nov-27.001.RAW.CID-738193281039340REGH/Sample/0001.avi": Invalid argument (inout.c:100)

Change History (8)

comment:1 Changed 8 years ago by cfpp2p

There seems to be some distinct similarities to some other ticket reports:

Or at least helpful in having someone point me in the right direction.

?similar/same? problem, maybe try using strace as suggested in this similar/same problem: https://trac.transmissionbt.com/ticket/4147#comment:19

no seeding here also: https://trac.transmissionbt.com/ticket/5135#comment:3

Last edited 8 years ago by cfpp2p (previous) (diff)

comment:2 follow-up: Changed 8 years ago by rb07

It looks like a simple "permission denied" problem.

But in fact its 2 problems:

  1. The file on disk has a different size than the file described by the .torrent file, that means it has been modified, which is causing the whole problem.
  1. The file has ownership & permissions that don't allow transmission-daemon to truncate it to the size specified in the .torrent file.

Why is transmission trying to truncate it? that's a question for the developers, the code was supposed to fix another problem (ref. https://trac.transmissionbt.com/browser/trunk/libtransmission/fdlimit.c#L385)

comment:3 in reply to: ↑ 2 Changed 8 years ago by helloadam

Replying to rb07:

  1. The file on disk has a different size than the file described by the .torrent file, that means it has been modified, which is causing the whole problem.

If this is the problem, then why does the hach check / verify local data say its 100% okay?

  1. The file has ownership & permissions that don't allow transmission-daemon to truncate it to the size specified in the .torrent file.

Why is transmission trying to truncate it? that's a question for the developers, the code was supposed to fix another problem (ref. https://trac.transmissionbt.com/browser/trunk/libtransmission/fdlimit.c#L385)

All files are owned by the right user, and have the correct permission. They can access it, etc.

Replying to cfpp2p:

Or at least helpful in having someone point me in the right direction.

?similar/same? problem, maybe try using strace as suggested in this similar/same problem: https://trac.transmissionbt.com/ticket/4147#comment:19

I will see how helpful strace is when I attach it to the running pid and all its forks.

Thanks!

comment:4 Changed 8 years ago by cfpp2p

If this is the problem, then why does the hach check / verify local data say its 100% okay?

This could happen under some very fringe/weird situation best described by: https://bugs.launchpad.net/ubuntu/+source/transmission/+bug/318249/comments/10

more complete information is here: https://bugs.launchpad.net/ubuntu/+source/transmission/+bug/318249

I think what the bug/318249 above is trying to describe is that the created torrent's metadata was created with a file that is a truncated version of the file that is currently being verified by transmission, but the file transmission is verifying is actually larger but passes the verify since the data up to the length of the created torrent's size is the same.

I can make sense out of this but it is a bit convoluted, and a little difficult to describe concisely. In the context of this Stuck Torrents bug why the file's size at fdlimit.c line 391 is less than the file described by the .torrent file remains to be seen.

Last edited 8 years ago by cfpp2p (previous) (diff)

comment:5 Changed 8 years ago by helloadam

Okay here is some strange but good news. Switching to a different version of libevent (went from libevent2.0 -> libevent1.4) and to transmission 2.13 has resolved the issue!

Now few things to note in my attempt to trace down this issue.

  • Problem nodes were running libevent2.0 w/ Fedora Core 17 x86_64
  • Transmission was compiled from source (2.73) on above nodes.

On a new instance we did the following:

  • Switched to CentOS 6.x and using its version of libevent2.0 worked w/ compied 2.73 source.

Back to FC 17 x86_64, we just downgraded to libevent1.4 and used an RPM of transmission 2.13 and it worked. We plan on switching to CentOS and using latest version.

Incase someone wants it: ldd difference on transmission-daemon: http://static.netops.me/cos06.vs.fc17-transmission.diff

Not sure how helpful this is, but I figured I might as well describe how I fixed this. If you want/need more information I am happy to help.

Last edited 8 years ago by helloadam (previous) (diff)

comment:6 Changed 8 years ago by x190

Is Fedora Core 17 using the very latest libevent version (2.0.21)?
http://libevent.org

comment:7 Changed 8 years ago by helloadam

We were running the following

2.0.18-1.fc17

comment:8 Changed 8 years ago by helloadam

  • Resolution set to worksforme
  • Status changed from new to closed

FYI: Just compiled transmission 2.73 and libevent 2.0.21 from source on a CentOS based machine and it is currently working fine (knock on wood). I am still not sure what the issue was with the Fedora Core library. But regardless, the fix was to downgrade versions.

Note: See TracTickets for help on using tickets.