Debating overlayfs

By Jonathan Corbet
June 15, 2011

Union filesystems allow multiple filesystems to be combined and presented to the user as a single tree. In typical use, a writable filesystem is overlaid on top of a read-only base, creating the illusion that all files on the filesystem can be changed. This mode of operation is useful for live CD distributions, embedded systems where a quick "factory reset" capability is desired, virtualized systems built on a common base filesystem, and more. Despite the value of this feature, Linux has never had an in-kernel union filesystem option, despite several attempts to create one. A recent attempt to change that situation may or may not succeed.

LWN looked at the overlayfs filesystem last year. Overlayfs, written by Miklos Szeredi, is distinguished by its relative simplicity. Recently, Miklos asked if overlayfs could be merged for the 3.1 development cycle. He may get his wish, but some worries will have to be addressed first.

Andrew Morton has raised a couple of concerns; one of which is that the problem might be better solved in user space. He dismissed the simplicity of overlayfs, saying "Not merging it would be even smaller and simpler," and suggested that performance problems should be addressed by making the user-space implementation faster. Linus has pretty much ended that aspect of the debate by saying "People who think that userspace filesystems are realistic for anything but toys are just misguided." So the way seems to be clear for a union filesystem implementation in the kernel.

Andrew's other concern is that overlayfs may not be a sufficiently complete solution:

$ sudo subscribe today
Subscribe today and elevate your LWN privileges. You’ll have access to all of LWN’s high-quality articles as soon as they’re published, and help support LWN in the process. Act now and you can start with a free trial subscription.

If overlayfs doesn't appreciably decrease the motivation to merge other unioned filesystems then we might end up with two similar-looking things. And, I assume, the later and more fully-blown implementation might make overlayfs obsolete but by that time it will be hard to remove.

That objection is harder to answer. It has been pointed out that OpenWRT is happily using overlayfs and Ubuntu is considering it. About the only viable alternative project is union mounts, which has not seen much developer attention recently. On the feature front, it doesn't seem like anything else will come along and outshine overlayfs in the near future.

On the technical side, union filesystems have always presented some unique challenges. Valerie Aurora, who has done a fair amount of work in this area, looked at overlayfs in March and seemed to be positive about it:

I took a quick look at the current overlayfs patch set, and it's small, clean, and easy to understand. If it does what people need, I say ship it.

She has changed her tune a bit in the current discussion, suggesting that there are some difficulties which need to be addressed:

Overlayfs is not the simplest possible solution at present. For example, it currently does not prevent modification of the underlying file system directories, which is absolutely required to prevent bugs according to Al. Al proposed a solution he was happy with (read-only superblocks), I implemented it for union mounts, and I believe it can be ported to overlayfs. But that should happen *before* merging.

She raised some locking concerns as well, which Miklos addressed in detail; the concern about changing the underlying filesystem has not been answered, though. So it's possible that technical correctness issues may yet delay the merging of overlayfs into the kernel. That said, it seems clear that there is demand for this feature, and that overlayfs appears to satisfy that demand nicely. There will likely come a time when keeping it out of the kernel becomes too hard to justify.

Index entries for this article
Kernel	Filesystems/Union
Kernel	Overlayfs

Shared inodes

Posted Jun 16, 2011 19:44 UTC (Thu) by martinfick (subscriber, #4455) [Link] (6 responses)

What we really need from a union type filesystem is for identical inodes in the bottom layer, to somehow show up as the same inode with respect to the memory management subsystem, no matter where they appear in upper layers. This needs to happen even if these show up in different top layer mount points to be truly beneficial. This would be a huge boon for sharing memory amongst processes in separate containers which run the same underlying executable. The containers could share a readonly bottom layer and yet have individual writable top layers in their individual namespaces, preventing them from clobbering the other container's files while still sharing memory efficiently on common executables. Of course, I am not sure how that could actually be done... :(

Shared inodes

Posted Jun 16, 2011 21:46 UTC (Thu) by ndye (guest, #9947) [Link]

Of course, I am not sure how that could actually be done... :(

Neither do I, and you paint the benefits well . . .

. . . but now your headache has gone viral.
;-)

Shared inodes

Posted Jun 17, 2011 6:38 UTC (Fri) by neilbrown (subscriber, #359) [Link] (3 responses)

You are in luck - overlayfs provides exactly what you want. Assuming I am understanding you correctly.

When you access (e.g. open) a file (not a directory) in a read-only mode which doesn't exist in the upper layer, you get exactly the file from the lower layer. If you fstat the file descriptor it will look exactly like the lower-layer file - st_dev, st_ino and all. It really is the lower-level file.

So much so that if someone else opens the file for 'write', it will get copied into the upper layer and they will get a handle on the file in the upper layer which they can then change, but you will still have a handle on the lower level file which, of course, will not see those changes.

Shared inodes

Posted Jun 17, 2011 16:14 UTC (Fri) by martinfick (subscriber, #4455) [Link] (2 responses)

> Assuming I am understanding you correctly.

So with overlayfs, if I have 1000 containers each with their own upper layer mounted separately on top of the same lower layer, and each one of them runs the same copy of apache, will the linux MM system share most of the memory for those apache executables, as much as if they all ran off of the same file in the lower layer directly?

If so, this will be a major boon for "virtualisation" on linux, extremely memory efficient and lightweight containers. This would allow linux containers in the mainline to share some of the ideas and similar benefits to the linux vserver project's "unification".

Shared inodes

Posted Jun 19, 2011 22:58 UTC (Sun) by Sho (subscriber, #8956) [Link] (1 responses)

Don't shared subtrees get you a long part of the way, too?

Shared inodes

Posted Jun 19, 2011 23:43 UTC (Sun) by neilbrown (subscriber, #359) [Link]

Shared subtrees are certainly part of the solution - and an important part.

If the Linux/Unix file hierarchy had been design with sufficient foresight (which would have been total impractical in reality) then you probably could do it all with shared subtrees. Those files that might need to be configure per-machine or per-instance would be in one subtree (a bit like /var maybe) and all the other files would be elsewhere. The one subtree would be copied for each instance, the rest would be shared.

But we don't have such a forward looking design .. and it is entirely possible that differing needs are such that such a design would be impossible. So configuration files are often mixed in with non-configuration files. A solution is needed which makes copies of the first type, but shares the second type.

One could imagine a forest-of-symlinks which could map all 'configuration' files into one subtree, but symlinks don't always (ever?) provide perfect semantics. If you update a config file by writing a new copy then renaming it, you break the symlink.

You could do the symlinks in the other direction: with symlinks for all the files that you want to share, but that would have it's own problems I suspect.

So overlayfs complements shared subtrees and allows you to selectively have some files shared and some files private within the same directory. And it achieved this almost transparently.

Shared inodes

Posted Feb 25, 2012 3:18 UTC (Sat) by scientes (guest, #83068) [Link]

What about vhashify http://linux-vserver.org/util-vserver:Vhashify ?
IOW hard-links on steroids.
Now, making this work in full-virtualization environments is not exactly the same problem....and certainly can't be as elegant.

Debating overlayfs

Posted Jun 21, 2011 9:51 UTC (Tue) by nikanth (guest, #50093) [Link] (1 responses)

Anyway, the writable filesystem on top is not usable by itself. The data on writable disk is meaningful only on top of read-only disk(fs).

Wouldn't it be better, if COW file-systems like btrfs can provide a feature to write new blocks only to writable disk, instead of going for generalized solutions. Btrfs would need a way to check for the root of the tree(superblock) in the new disk, before using the one from read-only disk.

Debating overlayfs

Posted Aug 2, 2012 11:26 UTC (Thu) by bluss (subscriber, #47454) [Link]

(this is happening with btrfs now, it's called "seed device")

Debating overlayfs

Posted Jun 22, 2011 18:24 UTC (Wed) by rilder (guest, #59804) [Link]

How does this compare to aufs{1,2} which have been in actual use by distros over the past few years.

Looks like the developer made an effort here to get it into tree -- http://thread.gmane.org/gmane.linux.file-systems/29813 , not sure where the discussion proceeded.