Using mmap() for Advanced File I/O

In this fourth part of a seven-part series on Linux I/O file system calls, you will learn how to use mmap(). It is excerpted from chapter four of the book Linux System Programming: Talking Directly to the Kernel and C Library, written by Robert Love (O'Reilly, 2007; ISBN: 0596009585). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:

By: O'Reilly Media

Rating:

/ 5

December 18, 2008

print this article

SEARCH DEV SHED

The only unfamiliar system call in this example should befstat(), which we will cover in Chapter 7. All you need to know at this point is that fstat()returns information about a given file. TheS_ISREG()macro can check some of this information, so that we can ensure that the given file is a regular file (as opposed to a device file or a directory) before we map it. The behavior of nonregular files when mapped depends on the backing device. Some device files are mmap-able; other nonregular files are not mmap-able, and will seterrnotoEACCESS.

The rest of the example should be straightforward. The program is passed a filename as an argument. It opens the file, ensures it is a regular file, maps it, closes it, prints the file byte-by-byte to standard out, and then unmaps the file from memory.

Advantages of mmap()

Manipulating files via mmap() has a handful of advantages over the standard read() andwrite()system calls. Among them are:

Reading from and writing to a memory-mapped file avoids the extraneous copy that occurs when using theread()orwrite()system calls, where the data must be copied to and from a user-space buffer.
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable mappings are shared in their entirety; private writable mappings have their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for thelseek()system call.

For these reasons,mmap()is a smart choice for many applications.

Disadvantages of mmap()

There are a few points to keep in mind when using mmap():

Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an integer number of pages is “wasted” as slack space. For small files, a significant percentage of the mapping may be wasted. For example, with 4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process’ address space. With a 32-bit address space, a very large number of various-sized mappings can result in fragmentation of the address space, making it hard to find large free contiguous regions. This problem, of course, is much less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is generally obviated by the elimination of the double copy mentioned in the previous section, particularly for larger and frequently accessed files.

For these reasons, the benefits ofmmap()are most greatly realized when the mapped file is large (and thus any wasted space is a small percentage of the total mapping), or when the total size of the mapped file is evenly divisible by the page size (and thus there is no wasted space).

Next: Resizing a Mapping >>