Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#3984 closed Enhancement (invalid)

comment in fdlimit.c 'FIXME' : right voodoo for fallocate64

Reported by: Astara Owned by:
Priority: Low Milestone: None Set
Component: Transmission Version: 2.13
Severity: Minor Keywords:
Cc:

Description

Did you mean this:

/usr/include/bits/fcntl.h:extern int fallocate64 (int fd, int mode, off64_t offset,

(This is from a SuSE 11.2 machine)

Change History (21)

comment:1 in reply to: ↑ description Changed 11 years ago by Astara

Replying to Astara: This was in reference to the lines:

#ifdef HAVE_FALLOCATE64
  /* FIXME can't find the right #include voodoo to pick up the declaration.. */
  extern int fallocate64( int fd, int mode, uint64_t offset, uint64_t len );
#endif

(and was included in the glibc-devel rpm) ;-)

Last edited 11 years ago by Astara (previous) (diff)

comment:2 Changed 11 years ago by jordan

I don't understand what this ticket is suggesting. Is it about using off64_t instead of uint64_t? Is it about including bits/fcntl.h to pick up the fallocate64 prototype?

comment:3 Changed 11 years ago by Astara

Well the comment says it can't find the right #include necessary to pick up the declaration of the following function 'extern int fallocate64(...', so it was 'hand-included' instead of doing a "#include" to pull in the correct file.

I posted the name of a file (and the rpm it came from) that contained the same function declaration.

Given the question in the code, I thought the answer was fairly straightforward. But if I misunderstood the question...?

comment:4 Changed 11 years ago by jordan

So, what file do I include, and how, to get the fallocate64() prototype?

comment:5 Changed 11 years ago by Astara

/usr/include/bits/fcntl.h was the file that had the prototype on my suse 11.2 machine with the 'glibc-devel' package installed. (Sorry to be so unclear...sometimes I feel like I'm speaking a different language. Maybe I'll eventually get you all the info you need! ;-))....

Last edited 11 years ago by Astara (previous) (diff)

comment:6 Changed 11 years ago by jordan

Sorry if I'm being unclear. What I'm asking you is how this translates into source code. I think you're telling me that I should #include <bits/fcntl.h> ... is that right?

Last edited 11 years ago by jordan (previous) (diff)

comment:7 Changed 11 years ago by Astara

Ah, yes, if an include file is in /usr/include/<path>, you should be able to include it in an include statement by removing the /usr/include part and passing that to include.

Alternatively (not suggesting this, just giving you an optional syntax), you could add the "-I/usr/include/bits" as a switch to the compiler, then you could just use #include <fcntl.h>, but that type of change is usually done when you are including alot of files from a subdir (like if you doing X11 programming and included "/usr/include/X11")...

Always hard to communicate when you don't know how much the other person knows. I'm usually flexible but I still have to find out -- I do 'tech support' w/my 80+ year old mom on the phone! ;-) I figure if I can explain it to her, I'm probably not doing too bad in breaking things down...

FWIW, I assure it will be available on all platforms in that location, since the 'bits' subdir is linux specific -- and even on suse, you have to have the special "devel" package for 'glibc' installed to get it installed. On other distros it may be in a different package.

Last edited 11 years ago by Astara (previous) (diff)

comment:8 Changed 11 years ago by jordan

...so did you try adding #include <bits/fcntl.h>} to fdlimit.c before opening this ticket, to see what would happen? :)

comment:9 Changed 11 years ago by jordan

If not, try it now and see what happens.

comment:10 Changed 11 years ago by Astara

There I go again, answering the wrong question...

The question isn't "where is the prototype for fallocate64", but should be what is the portable, posix specified method for doing allocation with a 64-bit file size, namely,

posix_fallocate;

This seems to be correct for a 64-bit system, where "posix_fallocate64" is not defined and shouldn't be used, since offsets on 64-bit platforms are natively 64-bits.

I found this comment in the asm-generic/unistd.h file:

 * 32 bit systems traditionally used different
 * syscalls for off_t and loff_t arguments, while
 * 64 bit systems only need the off_t version.
 * For new 32 bit platforms, there is no need to
 * implement the old 32 bit off_t syscalls, so
 * they take different names.

A 32-bit machine would likely have the have the _-_USE_LARGEFILE64 and _-_USE_FILE_OFFSET64 macros defined, so they would get the '64' bit versions of the calls.

Something like the following code patch should work, but I can't compile it for the other various platforms or the current 'head' (since my work tree is a bit older than 'head'), but trying to compile this module in my work tree gave a few unrelated errors that I assumed had to do with other changes/fixes.

What a mess! (more notes below)

--- fdlimit.c.orig      2011-02-03 05:13:04.000000000 -0800
+++ fdlimit.c   2011-02-07 17:11:06.093623125 -0800
@@ -27,14 +27,9 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-#ifdef SYS_DARWIN
- #include <fcntl.h>
-#endif
 
-#ifdef HAVE_FALLOCATE64
-  /* FIXME can't find the right #include voodoo to pick up the declaration.. */
-  extern int fallocate64( int fd, int mode, uint64_t offset, uint64_t len );
-#endif
+#include <fcntl.h>
+
 
 #ifdef HAVE_XFS_XFS_H
  #include <xfs/xfs.h>
@@ -89,9 +84,11 @@
     if( !length )
         success = TRUE;
 
-#ifdef HAVE_FALLOCATE64
-    if( !success ) /* fallocate64 is always preferred, so try it first */
-        success = !fallocate64( fd, 0, 0, length );
+#ifdef HAVE_POSIX_FALLOCATE64
+    if (!success) /* fallocate64 is always preferred, so try it first */
+        success = !posix_fallocate64(fd, 0, length);
+#else
+        success = !posix_fallocate(fd, 0, length);
 #endif
 
     if( !success ) /* fallback: the old-style seek-and-write */
@@ -124,11 +121,9 @@
     int fd = open( filename, flags, 0666 );
     if( fd >= 0 )
     {
-# ifdef HAVE_FALLOCATE64
-       if( !success )
-       {
-           success = !fallocate64( fd, 0, 0, length );
-       }
+# ifdef HAVE_POSIX_FALLOCATE64
+       if (!success)
+           success = !posix_fallocate64(fd, 0, length);
 # endif
 # ifdef HAVE_XFS_XFS_H
         if( !success && platform_test_xfs_fd( fd ) )

There is some ambiguity in the above code changes, since I don't know what the original intention was.

Namely, there's a use of an xfs-specific routine (in two places) that appears to be given priority over the posix_fallocate in the one place it is used.

The comment said that fallocate64 was the most preferred, but not sure why. It was there I made the change, changing it to a call to posix_fallocate64 (which takes 1 arg less than fallocate64, BTW, a 'mode' argument that only has one allowed value), but then I followed that in conditional compilation with the assumption (maybe it should be tested?) that if we didn't use the 64-bit version of posix_fallocate(64), then use the _native_ call which is 64-bits.

This sorta assumes that the 64-bit calls are defined on 32 bit platforms, as there is where they are needed (since default offsets were too small).

But this would likely place posix_fallocate above the 'xfs' code in terms of priority.

I'm not sure if that's a good thing or not. Ideally at the lower levels, the OS would be smart enough to call the xfs calls if it was an xfs file system, ...B..U..T.., I find that unlikely. So you may wish to have the call to the xfs-allocate before the attempted useage of a native 64-bit posix_fallocate call.

I've no clue what Darwin supports, but it looks like it had fcntl.h included twice if platform was Darwin: once in a conditional just for Darwin, and a 2nd time, farther down for all platforms. Was that intentional?

And here I thought this would be an easy fix of a minor 'FIXME' comment...why can't anything just be simple....

Let me know if there's anything else you want me to look into...

Last edited 11 years ago by Astara (previous) (diff)

comment:11 Changed 11 years ago by jordan

Unfortunately that patch won't work. We really shouldn't use fallocate and posix_fallocate interchangably. Theodore Tso summarized the problem in Linux magazine:

The fallocate() system call is not in most glibc’s as of this writing, but posix_fallocate() is; the problem with posix_fallocate is that if you use it on ext3, it will attempt to emulate fallocate() by writing all zeros to the file. This emulation step can be very slow, and may come as a surprise to the application that was expecting posix_fallocate() to be quick; the fallocate() system call has the advantage that if it is not present, it will fail, and the application can then decide on its own what it wants to do.

Last edited 11 years ago by jordan (previous) (diff)

comment:12 Changed 11 years ago by jordan

context: #2849

comment:13 Changed 11 years ago by Astara

Oh frack.

That's ugly -- but then ext3 was always a bit broken...

Um, hmmm...is what he said about fallocate/posix_fallocate applicable to fallocate64/posix_fallocate64?

When did Ted write that? I wonder if it is still a problem...

Makes it really ugly for apps to try to be portable but watch out for broken implementations.

Ugly ugly ugly...

comment:14 Changed 11 years ago by jordan

Yes it still applies, and applies to them all.

So that's why it's prototyped the way it is... ;)

comment:15 Changed 11 years ago by jordan

Astara, do you have anything left for this ticket? I don't see anything actionable here.

comment:16 Changed 11 years ago by Astara

If it was my code, I'd leave a comment in the code explaining why it was done a certain way so any future maintainer (including myself when I've sufficiently forgot my previous context) would know it was done in a non-portable way in order to get around a known bug in the ext3 file system that was last known to be a problem on (last date verified). I'd try to leave some contact info or reference to 1 or more websites where I could check on the status of their bug.

I'd also tend to write the code the correct way, then implement a check that checks for 'fs==ext3', then ... do obscure workaround, that way the impact of using the workaround code is clearly limited to the known problem case.

Now it may be it's a problem on other platforms, in which case, that should, IMO, be documented. I'm surprised a bug like this hasn't been addressed, in some form, since it seems trivial for those responsible for the posix_fallocate function to trivially implement a quick workaround for the problem (i.e. -- just call 'fallocate64').

Do you have a pointer to the note or magazine issue where Ted made this claim? Since the impact of this bug is 'high' and the workaround is so 'trivial', I'd probably follow up with the owners of the library (gnu?) to see why they hadn't issued a patch/fix. Maybe they have -- in which case, I'd write my own posix_fallocate function that implements the workaround/patch if it detects it is on ext3 and isn't operating with the fixed gnu library...

Unless this 'incident' is documented under another incident number, you might want to leave this open, as it was filed as an 'enhancement' w/low priority.

I believe that designation still applies: something that would be good to address (would make the source more portable), but isn't really "broken" as it does work -- thus the 'enhancement' category. And since not alot can be done about it right now, it would fall into the 'low priority' category. Do you have thoughts/ideas about it that would support alternate characterizations?

If you want to close it, you can, I won't be upset, but given that it is a problem, I wouldn't think that's defeating the point of the database -- to hold issues or ideas about the project -- not all of which are going to be addressable 'immediately', but a way to document those issues so things (problems, enhancements, ideas, etc) don't fall between the cracks...

comment:17 Changed 11 years ago by jordan

  • Resolution set to invalid
  • Status changed from new to closed

If you'd followed the link I sent to #2849 you would see that Tso wrote that in Linux Magazine.

Also, IMO it's not a bug, just two features that have different documented behaviors that happen to share a word in the function name. The problem is that some filesystems don't support fast preallocation. But if you want to take that up upstream, that's your business.

comment:18 Changed 11 years ago by Astara

So you think coding against undocumented calls is 'good practice'.

Who said this was a 'bug'? It was an RFE enhancement_ -- to *CLEAN* up sloppy coding practice using non-portable interfaces.

You think that's a bad idea?

As for file systems not supporting fast pre-allocation -- if ext3 didn't support support it, why would it work, at all, with fdallocate64?

As for pre-allocating file space that way -- xfs has a delay'ed zero feature built into it, but it was created 10 years before ext3. It had to support writing multiple streams of uncompressed video to disk in parallel and not end up fragmented -- almost 15 years ago. It also had to support a requirement that came (from the security community) not too long afterward of ensuring that any space that had been previously used, was re-initialized before allowing its re-use (so you don't read the previous owner's content.

XFS gets around writing 0's to disk as it marks those sectors as needing to be zeroed before being read -- a persistent flag for those sectors that remains across reboots. So if the first thing a process does is write to them, no zeroing ever occurs, but if a read is attempted, uninitialized areas are zero-filled in memory, on the assumption that the user may then write the data out to disk -- thus causing any old data to eventually be overwritten.

What happens on ext3, if you 'seek' to the desired location, write a 'byte' -- I take it that doesn't do the type of pre-allocation you want?

Last edited 11 years ago by Astara (previous) (diff)

comment:19 Changed 11 years ago by jordan

Do you have a pointer to the note or magazine issue where Ted made this claim? Since the impact of this bug is 'high' and the workaround is so 'trivial', I'd probably follow up with the owners of the library (gnu?) to see why they hadn't issued a patch/fix. Maybe they have -- in which case, I'd write my own posix_fallocate function that implements the workaround/patch if it detects it is on ext3 and isn't operating with the fixed gnu library...

Also, IMO it's not a bug, just two features that have different documented behaviors that happen to share a word in the function name. The problem is that some filesystems don't support fast preallocation. But if you want to take that up upstream, that's your business.

Who said this was a 'bug'? It was an RFE enhancement_ -- to *CLEAN* up sloppy coding practice using non-portable interfaces.

You did.


What happens on ext3, if you 'seek' to the desired location, write a 'byte' -- I take it that doesn't do the type of pre-allocation you want?

Correct. Sparse allocation doesn't reduce file fragmentation.

comment:20 follow-up: Changed 11 years ago by Astara

My reference to calling it a 'bug' was the problem with ext3 doing real-time zero file -- not in transmission. In transmission it's about cleaning up the code to not have to duplicate headers for external routines -- that was the enhancement part.

What I don't see about the ext3 hack of doing zero fill: I don't see how that guarantees the space will be contiguous.

comment:21 in reply to: ↑ 20 Changed 11 years ago by jordan

Replying to Astara:

My reference to calling it a 'bug' was the problem with ext3 doing real-time zero file -- not in transmission. In transmission it's about cleaning up the code to not have to duplicate headers for external routines -- that was the enhancement part.

...which we've talked about now for 20 comments. Do you have a working diff for fdlimit.c that picks up the system's prototype for fallocate64?

What I don't see about the ext3 hack of doing zero fill: I don't see how that guarantees the space will be contiguous.

zero fill guarantees full allocation, but not contiguous or fast allocation which is why it's only used in preallocate_file_full(), and only as a last resort after several better attempts have failed. This shouldn't be confusing.

Note: See TracTickets for help on using tickets.