#include <fcnt.h> int open (const char *filename, int flags[, mode_t mode]) |
The open
function creates and returns a new file descriptor
for the file named by filename. Initially, the file position
indicator for the file is at the beginning of the file. The argument
mode specifies file permissions and is used only when a file is created.
The flags argument controls how the file is to be opened. This is a bit mask; you create the value by the bitwise OR of the appropriate parameters (using the `|' operator in C). See File Status Flags, for the parameters available.
The normal return value from open
is a non-negative integer file
descriptor. In the case of an error, a value of -1 is returned
instead.
#include <unistd.h> int close (int filedes) |
The function close
closes the file descriptor filedes.
The normal return value from close
is 0; a value of -1
is returned in case of failure. The usual failure mode an argument
that is not a valid file descriptor.
One and only one of the following three constants (from
fcntl.h
) may be specified in the flags
argument:
Flag | Description |
---|---|
O_RDONLY | Open the file for read access. |
O_WRONLY | Open the file for write access. |
O_RDWR | Open the file for both reading and writing |
The following constants are optional:
Flag | Description |
---|---|
O_APPEND | Append to the end of the file on each write. |
O_CREAT | Create the file if it doesn't already exist. |
O_EXCL | If both O_CREAT and O_EXCL are set, then open fails if the specified file already exists. |
O_TRUNC | Truncate the file to zero length. |
O_NOCTTY | If the named file is a terminal device, don't make it the controlling terminal for the process. |
O_NONBLOCK | This prevents open from blocking for a “long time” to open the file. |
O_NOLINK | If the named file is a symbolic link, open the link itself instead of the file it refers to. |
The use of O_NONBLOCK
is only meaningful for some kinds of files, usually devices such
as serial ports; when it is not meaningful, it is harmless and
ignored. Often opening a port to a modem blocks until the modem
reports carrier detection; if O_NONBLOCK
is specified,
open
will
return immediately without a carrier.
#include <unistd.h> ssize_t read (int filedes, void *buffer, size_t size); ssize_t write (int filedes, const void *buffer, size_t size); |
Data Type: ssize_t is used to represent the sizes of blocks that can be
read or written in a single operation. It is similar to size_t
,
but must be a signed type.
read
The read
function reads up to size bytes from the file
with descriptor filedes, storing the results in the buffer.
(This is not necessarily a character string, and no terminating null
character is added.)
The return value is the number of bytes actually read. This might be less than size; for example, if there aren't that many bytes left in the file or if there aren't that many bytes immediately available. The exact behavior depends on what kind of file it is. Note that reading less than size bytes is not an error.
A value of zero indicates end-of-file (except if the value of the
size argument is also zero). This is not considered an error.
If you keep calling read
while at end-of-file, it will keep
returning zero and doing nothing else.
If read
returns at least one character, there is no way you can
tell whether end-of-file was reached. But if you did reach the end, the
next read will return zero.
In case of an error, read
returns -1.
write
The write
function writes up to size bytes from
buffer to the file with descriptor filedes. The data in
buffer is not necessarily a character string and a null character is
output like any other character.
The return value is the number of bytes actually written. This may be
size, but can always be smaller. Your program should always call
write
in a loop, iterating until all the data is written.
In the case of an error, write
returns -1.
Once write
returns, the data is enqueued to be written and can be
read back right away, but it is not necessarily written out to permanent
storage immediately. You can use fsync
when you need to be sure
your data has been permanently stored before continuing. (It is more
efficient for the system to batch up consecutive writes and do them all
at once when convenient. Normally they will always be written to disk
within a minute or less.) Modern systems provide another function
fdatasync
which guarantees integrity only for the file data and
is therefore faster.
The file system for this test was the Linux ext2 file system with 4,096-byte blocks. This accounts for the minimum in the system time occurring at a BUFSIZE of 4,096 bytes. Increasing the buffer size beyond this has little positive effect.
Most file systems support some kind of read-ahead to improve performance. When sequential reads are detected, the system tries to read in more data than the application requests, assuming that the application will read it shortly. From the table, it appears that read-ahead in ext2 stops having an effect after 128KB.
Beware when trying to measure the performance of programs that read and write files. The operating system will try to cache the file in memory, so if you measure the performance of the program repeatedly, the successive timings will likely be better than the first. This is because the first run will cause the file to be entered into the system's cache, and successive runs will access the file from the system's cache instead of from the disk.
BUFSIZE | User CPU (seconds) | System
CPU (seconds) | Clock time seconds | #loops |
---|---|---|---|---|
1 | 124.89 | 161.65 | 288.64 | 103,316,352 |
2 | 63.10 | 80.96 | 145.81 | 51,658,176 |
4 | 31.84 | 40.00 | 72.75 | 25,829,088 |
8 | 15.17 | 21.01 | 36.85 | 12,914,544 |
16 | 7.86 | 10.27 | 18.76 | 6,457,272 |
32 | 4.13 | 5.01 | 9.76 | 3,228,636 |
64 | 2.11 | 2.48 | 6.76 | 1,614,318 |
128 | 1.01 | 1.27 | 6.82 | 807,159 |
256 | 0.56 | 0.62 | 6.80 | 403,579 |
512 | 0.27 | 0.41 | 7.03 | 201,789 |
1,024 | 0.17 | 0.23 | 7.84 | 100,894 |
2,048 | 0.05 | 0.19 | 6.82 | 50,447 |
4,096 | 0.03 | 0.16 | 6.86 | 25,223 |
8,192 | 0.01 | 0.18 | 6.67 | 12,611 |
16,384 | 0.02 | 0.18 | 6.87 | 6,305 |
32,768 | 0.00 | 0.16 | 6.70 | 3,152 |
65,536 | 0.02 | 0.19 | 6.92 | 1,576 |
131,072 | 0.00 | 0.16 | 6.84 | 788 |
262,144 | 0.01 | 0.25 | 7.30 | 394 |
524,288 | 0.00 | 0.22 | 7.35 | 198 |
In the tests reported here, each run with a different buffer size was made using different copy of the file so that the current run didn't find the data in the cache from previous run. The files are large enough that they all don't remain in the cache (the test system was configured with 512 MB of RAM)
The following discussion is excerpted from Das, p 515.
To appreciate the debate that concerns system calls and library
functions, you need to know something about the way disk I/O actually
takes place. The read and wri te calls never access the disk directly.
Rather, they read and write a pool of kernel buffers, called the
buffer cache. If the buffer is found to be empty during a
read, the kernel instructs the disk controller to read data from disk
and fill up the cache. read
blocks (waits) while the disk is being
read and the process even relinquishes control of the CPU.
To ensure that a single invocation of read
gathers all bytes stored
in the kernel buffer, the size of the latter and buffer used by read
(char buf[BUFSIZE]
in a previous example) should be equal. Improper
setting of the buffer size can make your program inefficient. So if
each kernel buffer stores 8192 bytes, then BUFS I ZE should also be
set to 8192. A smaller figure makes I/O inefficient, but a larger
figure doesn't improve performance.
write
also uses the buffer cache, but it differs from
read
in one way: it returns immediately after the call is
invoked. The kernel writes the buffer to disk later at a convenient
time. Database applications often can't accept this behavior, in which
case you should open a file with the O_SYNC
status flag
to ensure that write
doesn't return until the kernel has
finally written the buffer to disk.
Unlike the standard library functions, the read and write calls are
unbuffered when they interact with the terminal. When you use
write
to
output a string to the terminal, the string appears on your display as
soon as the call is invoked. On the other hand, the standard library
functions (like printf
) are line-bUffered when they
access the terminal. That means a string is printed on the terminal
only when the newline character is encountered.
The size of the kernel buffer is system-dependent and is set at the time of installation of the operating system. To develop portable and optimized applications, you must not use a feature that is system-dependent. You can't arbitrarily set BUFSIZE to 8192. This is where library functions come in.
The I/O-bound library functions use a buffer in the FILE structure
and adjust its size dynamically during runtime
usingma11oc
. Unless you are using system calls for their
exclusive features, it makes sense to use library functions on most
occasions.
#include <unistd.h> off_t lseek (int filedes, off_t offset, int whence) |
The lseek
function is used to change the file position of the
file with descriptor filedes.
The whence argument specifies how the offset should be
interpreted, in the same way as for the fseek
function, and it must
be one of the symbolic constants SEEK_SET
, SEEK_CUR
, or
SEEK_END
.
Flag | Description |
---|---|
SEEK_SET | Seek from the beginning
of the file.
|
SEEK_CUR | Seek from the current
file position. This count may be positive or negative.
|
SEEK_END | Seek from the end of the file. |
The return value from lseek
is normally the resulting file
position, measured in bytes from the beginning of the file.
You can use this feature together with SEEK_CUR
to read the
current file position:
cur_pos = lseek(fd,0,SEEK_CUR);
If the file position cannot be changed, or the operation is in some way
invalid, lseek
returns a value of -1.
A negative count with SEEK_END
specifies a position within the current
extent of the file; a positive count specifies a position past the
current end. If you set the position past the current end, and
actually write data, you will extend the file with zeros up to that
position.
If you want to append to the file, setting the file position to the
current end of file with SEEK_END
is not sufficient. Another
process may write more data after you seek but before you write,
extending the file so the position you write onto clobbers their data.
Instead, use the O_APPEND
operating mode.
You can set the file position past the current end of the file. This
does not by itself make the file longer; lseek
never changes the
file. But subsequent output at that position will extend the file.
Characters between the previous end of file and the new position are
filled with zeros. Extending the file in this way can create a
“hole”: the blocks of zeros are not actually allocated on disk, so the
file takes up less space than it appears to; it is then called a
“sparse file”.
The lseek
function is the underlying primitive for the
fseek
, fseeko
, ftell
, ftello
and
rewind
functions, which operate on streams instead of file
descriptors.
#include <sys/stat.h> int stat (const char *filename, struct stat *buf); int fstat (int filedes, struct stat *buf); int lstat (const char *filename, struct stat *buf); |
The stat
function returns information about the attributes of the
file named by filename in the structure pointed to by buf.
If filename is the name of a symbolic link, the attributes you get
describe the file that the link points to. If the link points to a
nonexistent file name, then stat
fails reporting a nonexistent
file.
The return value is 0
if the operation is successful, or
-1
on failure.
The fstat
function is like stat
, except that it takes an
open file descriptor as an argument instead of a file name.
Like stat
, fstat
returns 0
on success and -1
on failure.
The lstat
function is like stat
, except that it does not
follow symbolic links. If filename is the name of a symbolic
link, lstat
returns information about the link itself; otherwise
lstat
works like stat
.
When you read the attributes of a file, they come back in a structure
called struct stat
. This section describes the names of the
attributes, their data types, and what they mean.
The stat
structure type is used to return information about the
attributes of a file. It contains at least the following members:
Member | Description |
---|---|
mode_t st_mode | Specifies the mode of the file. This includes file type information (see Testing File Type) and the file permission bits (see Permission Bits). |
ino_t st_ino | The file serial number, which distinguishes this file from all other files on the same device. |
dev_t st_dev | Identifies the device containing the file. The st_ino and
st_dev , taken together, uniquely identify the file. The
st_dev value is not necessarily consistent across reboots or
system crashes, however.
|
nlink_t st_nlink | The number of hard links to the file. This count keeps track of how many directories have entries for this file. If the count is ever decremented to zero, then the file itself is discarded as soon as no process still holds it open. Symbolic links are not counted in the total. |
uid_t st_uid | The user ID of the file's owner. |
gid_t st_gid | The group ID of the file. |
off_t st_size | This specifies the size of a regular file in bytes. For files that are really devices this field isn't usually meaningful. For symbolic links this specifies the length of the file name the link refers to. |
time_t st_atime | This is the last access time for the file. |
unsigned long int st_atime_usec | This is the fractional part of the last access time for the file. |
time_t st_mtime | This is the time of the last modification to the contents of the file. |
unsigned long int st_mtime_usec | This is the fractional part of the time of the last modification to the contents of the file. |
time_t st_ctime | This is the time of the last modification to the attributes of the file. |
unsigned long int st_ctime_usec | This is the fractional part of the time of the last modification to the attributes of the file. |
blkcnt_t st_blocks | This is the amount of disk space that the file occupies, measured in units of 512-byte blocks. |
unsigned int st_blksize | The optimal block size for reading of writing this file, in bytes. You
might use this size for allocating the buffer space for reading of
writing the file. (This is unrelated to st_blocks .)
|
Some of the file attributes have special data type names which exist specifically for those attributes. (They are all aliases for well-known integer types that you know and love.) These typedef names are defined in the header file sys/types.h as well as in sys/stat.h.
The number of disk blocks in st_blocks
is not strictly proportional to the size of
the file, for two reasons: the file system may use some blocks for
internal record keeping; and the file may be sparse—it may have
“holes” which contain zeros but do not actually take up space on the
disk.
You can tell (approximately) whether a file is sparse by comparing this
value with st_size
, like this:
(st.st_blocks * 512 < st.st_size)
This test is not perfect because a file that is just slightly sparse might not be detected as sparse at all. For practical applications, this is not a problem.
Each file has three time stamps associated with it: its access time,
its modification time, and its attribute modification time. These
correspond to the st_atime
, st_mtime
, and st_ctime
members of the stat
structure.
All of these times are represented in calendar time format, as
time_t
objects. This data type is defined in time.h.
For more information about representation and manipulation of time
values, see Calendar Time.
Reading from a file updates its access time attribute, and writing
updates its modification time. When a file is created, all three
time stamps for that file are set to the current time. In addition, the
attribute change time and modification time fields of the directory that
contains the new entry are updated.
The file mode, stored in the st_mode
field of the file
attributes, contains two kinds of information: the file type code, and
the access permission bits. This section discusses only the type code,
which you can use to tell whether the file is a directory, socket,
symbolic link, and so on. For details about access permissions see
Permission Bits.
There are two ways you can access the file type information in a file mode. Firstly, for each file type there is a predicate macro which examines a given file mode and returns whether it is of that type or not. Secondly, you can mask out the rest of the file mode to leave just the file type code, and compare this against constants for each of the supported file types.
All of the symbols listed in this section are defined in the header file
sys/stat.h.
The following predicate macros test the type of a file, given the value
m which is the st_mode
field returned by stat
on
that file:
Macro | Description |
---|---|
int S_ISDIR (mode_t m) | non-zero if the file is a directory. |
int S_ISCHR (mode_t m) | non-zero if the file is a character special file (a device like a terminal). |
int S_ISBLK (mode_t m) | non-zero if the file is a block special file (a device like a disk). |
int S_ISREG (mode_t m) | non-zero if the file is a regular file. |
int S_ISFIFO (mode_t m) | non-zero if the file is a FIFO special file, or a pipe. See Pipes and FIFOs. |
int S_ISLNK (mode_t m) | non-zero if the file is a symbolic link. See Symbolic Links. |
int S_ISSOCK (mode_t
m) | non-zero if the file is a socket. See Sockets. |
The file mode, stored in the st_mode
field of the file
attributes, contains two kinds of information: the file type code, and
the access permission bits. This section discusses only the access
permission bits, which control who can read or write the file.
All of the symbols listed in this section are defined in the header file sys/stat.h. These symbolic constants are defined for the file mode bits that control access permission for the file:
The actual bit values of the symbols are listed in the table above so you can decode file mode values when debugging your programs. These bit values are correct for most systems, but they are not guaranteed.
Warning: Writing explicit numbers for file permissions is bad practice. Not only is it not portable, it also requires everyone who reads your program to remember what the bits mean. To make your program clean use the symbolic names.
For a directory the sticky bit gives permission to delete a file in that directory only if you own that file. Ordinarily, a user can either delete all the files in a directory or cannot delete any of them (based on whether the user has write permission for the directory). The same restriction applies—you must have both write permission for the directory and own the file you want to delete. The one exception is that the owner of the directory can delete any file in the directory, no matter who owns it (provided the owner has given himself write permission for the directory). This is commonly used for the /tmp directory, where anyone may create files but not delete files created by other users.
Originally the sticky bit on an executable file modified the swapping policies of the system. Normally, when a program terminated, its pages in core were immediately freed and reused. If the sticky bit was set on the executable file, the system kept the pages in core for a while as if the program were still running. This was advantageous for a program likely to be run many times in succession. This usage is obsolete in modern systems. When a program terminates, its pages always remain in core as long as there is no shortage of memory in the system. When the program is next run, its pages will still be in core if no shortage arose since the last run.
On some modern systems where the sticky bit has no useful meaning for an
executable file, you cannot set the bit at all for a non-directory.
If you try, chmod
fails with EFTYPE
.
Some systems (particularly SunOS) have yet another use for the sticky bit. If the sticky bit is set on a file that is not executable, it means the opposite: never cache the pages of this file at all. The main use of this is for the files on an NFS server machine which are used as the swap area of diskless client machines. The idea is that the pages of the file will be cached in the client's memory, so it is a waste of the server's memory to cache them a second time. With this usage the sticky bit also implies that the filesystem may fail to record the file's modification time onto disk reliably (the idea being that no-one cares for a swap file).
#include <unistd.h> int truncate (const char *filename, off_t length) int ftruncate (int fd, off_t length) |
The truncate
function changes the size of filename to
length. If length is shorter than the previous length, data
at the end will be lost. The file must be writable by the user to
perform this operation.
If length is longer, holes will be added to the end. However, some systems do not support this feature and will leave the file unchanged.
The return value is 0 for success, or -1 for an error.
The ftruncate
function is like truncate
,
but it works on a file descriptor fd for an opened file
instead of a file name to identify the object. The file must be
opened for writing to successfully carry out the operation.
Low-Level Input/Output (from the GNU C Library Reference Manual).
Sumitabha Das, Your Unix, the Ultimate Guide, Second Edition,
McGraw-Hill, 2006. ISBN 0-07-252042-6. chapter 16
Neil Matthew and Richard Stone, Beginning Linux Programming,
Third Edition,
Wrox, 2004. ISBN 0-7645-4497-7. p 96-106.
W. Richard Stevens and Stephen A. Rago, Advanced Programming in the UNIX Environment, Second Edition, Addison Wesley, 2005. ISBN 0-201-43307-9. p 60-70
Maintained by John Loomis, last updated 10 September 2006