Opened 6 years ago

Last modified 3 years ago

#5161 new Bug

Unable to save resume file: Too many open files

Reported by: gatilisk Owned by:
Priority: Normal Milestone: None Set
Component: Daemon Version: 2.73
Severity: Normal Keywords: too many open files
Cc: nikoli@…

Description

Tried on 2.51, 2.61, 2.73 (sources downloaded and compiled yesterday). Transmission daemon started on debian squeeze and services about 4500 torrents.
# transmission-daemon -V
transmission-daemon 2.73 (13592)
# grep open-file settings.json
"open-file-limit": 200000
# ulimit -n
999999
# sysctl vm.max_map_count
vm.max_map_count = 300000

Permanently about 500-1000 torrents paused by daemon with status "Unable to save resume file: Too many open files".
settings.json attached

Attachments (1)

settings.json (2.5 KB) - added by gatilisk 6 years ago.
settings file

Download all attachments as: .zip

Change History (11)

Changed 6 years ago by gatilisk

settings file

comment:1 Changed 6 years ago by jordan

Transmission's cap on the open file limit is a duplicate of bug #5056.

The remaining "too many open files" issue is likely a duplicate of bug #4799, which can be resolved with a different build of libcurl. Does an lsof give similar results to what rb07 was seeing in #4799?

Also, props for seeding 4500 torrents.

comment:2 Changed 6 years ago by gatilisk

I've seen mentioned bugs. But there is no such message "Changed open file limit from..." in log file, but a lot of "Couldn't create socket: Too many open files (fdlimit.c:682)".
The daemon uses about 500-1000(rarely ~1100) descriptors and about 100-200 of them are file descriptors).
I resume all paused torrents.
# lsof -np 24497 | wc -l
758
# lsof -np 24497 | grep IPv4 | wc -l
654
# lsof -np 24497 | grep CLOSE_WAIT | wc -l
2
After 3 hours, 536 sockets and 80 files are opened (0 sockets in CLOSE_WAIT state), 461 torrents are paused again with status "Unable to save resume file: Too many open files". Quantity of half-closed sockets is stable (0-20), and doesn't increase after several hours.

comment:3 Changed 6 years ago by cfpp2p

paused again with status "Unable to save resume file: Too many open files

I don't know if this will be any help but I had a similar problem once and don't remember exactly how it was resolved for me. During the interim I used the following code which adds the 'bool was = tor->isStopping;' section at the end of:

resume.c
tr_torrentSaveResume()

...

    filename = getResumeFilename( tor );
    if(( err = tr_bencToFile( &top, TR_FMT_BENC, filename )))
    {
        bool was = tor->isStopping;
        tr_torrentSetLocalError( tor, "Unable to save resume file: %s", tr_strerror( err ) );
        tor->isStopping = was;
    }
    tr_free( filename );

    tr_bencFree( &top );
}

This prevents the torrents from pausing unless they are intentionally stopped and eliminated a lot of grief for me. With this the torrents will continue but really don't save the resume file! Sorry, can't exactly remember how I fixed the problem's cause but I still carry this code with no ill effects.

comment:4 Changed 6 years ago by x190

From attached settings.json: "peer-limit-global": 40240,

Default is 240 and open file limit including peer connections is hard-coded to 1024. Set it to less than 1024 (say 800 or less) and try again.

https://trac.transmissionbt.com/wiki/EditConfigFiles

Jordan and livings124: Why isn't the upper limit made clear in the documentation and the [Mac Client] interface? For example, the Mac Client r13629 will accept values up to 3000 for 'Global maximum connections' , but we never want more than about 800-900 as we need room for quote 'comment:2 100-200 of them are file descriptors'.

Last edited 6 years ago by x190 (previous) (diff)

comment:5 follow-up: Changed 6 years ago by jordan

A few random thoughts...

  • I'd be surprised if transmission kept more than FILE_CACHE_SIZE (32) files open for local data.
  • open-file-limit has been deprecated for awhile and is not used in the current codebase. It doesn't matter whether its value is 100 or 200000.
  • cfpp2p's patch seems more like a band-aid for keeping Transmission limping along even after there are too many open files... it's kind of a clever idea for using in the interim until the problem's tracked down, but I'd hate to use it in production.
  • x190's got a fair point about the peer limits. It's probably because the GUIs were in place before the fd limit was locked to 1024. x190, please make a separate ticket for that GUI issue.

So with that in mind... from the information given so far, it looks like most of these are IPv4 connections not in CLOSE_WAIT, but I'm not sure where to go from there. Does anyone have any thoughts on that?

comment:6 Changed 6 years ago by x190

Hi Jordan: I think we've gone over this before! :)

man getrlimit RLIMIT_NOFILE The maximum number of open files for this process.

Setting max global peer limit to FD_SETSIZE is useless.

Total of peer sockets + FILE_CACHE_SIZE(32) + all other open files for this process must be <= FD_SETSIZE (hopefully 1024).

"all other open files for this process" can easily be over 100.

session->peerLimit should be <= (FD_SETSIZE - FILE_CACHE_SIZE - ~200).

static void
ensureSessionFdInfoExists( tr_session * session )
{
...
        const int FILE_CACHE_SIZE = 32;

...
        /* set the open-file limit to the largest safe size wrt FD_SETSIZE */
        if( !getrlimit( RLIMIT_NOFILE, &limit ) )
        {
            const int old_limit = (int) limit.rlim_cur;
            const int new_limit = MIN( limit.rlim_max, FD_SETSIZE );
            if( new_limit != old_limit )
            {
                limit.rlim_cur = new_limit;
                setrlimit( RLIMIT_NOFILE, &limit );
                getrlimit( RLIMIT_NOFILE, &limit );
                tr_inf( "Changed open file limit from %d to %d", old_limit, (int)limit.rlim_cur );
            }
        }
    }
}

  • If you really want session->peerLimit to be FD_SETSIZE, then you need to set limit.rlim_cur to FD_SETSIZE plus a buffer of say 256 (r11889 uses a buffer of 512) to handle 'all other open files for this process'.
Last edited 5 years ago by x190 (previous) (diff)

comment:7 Changed 5 years ago by Nikoli

  • Cc nikoli@… added

comment:8 in reply to: ↑ 5 Changed 5 years ago by x190

Replying to jordan:

A few random thoughts...

  • I'd be surprised if transmission kept more than FILE_CACHE_SIZE (32) files open for local data.

Using 'opensnoop -p' indicates that we are always at FD 32, or higher, even with only one torrent running. Should we increase this value?

comment:9 Changed 3 years ago by cfpp2p

I might be seeing (UNfixed) #6127 causing a problem here. Each time we scrape slotsAvailable is incremented (but never decremented). If there are enough torrents with trackers eventually the conditions may arise that we might scrape or announce so much we're guaranteed to exceed 1024.

comment:10 Changed 3 years ago by x190

There are currently far too many ways to run into this issue in its various manifestations.

https://forum.transmissionbt.com/viewtopic.php?f=4&t=17991&sid=2e36d4cf3abdbe77153c64448576f11e&start=15#p73795

Note: See TracTickets for help on using tickets.