This is Not Tail-Recursive, Because You Recurse from a Closure
The return traverse_directory(
...)
statement would be tail-recursive from traverse_directory
, but it is being called from a closure. That’s equivalent to calling from a named helper function. The function and the anonymous closure within it could still be mutually-recursive if the closure were tail-called, and get most of the same optimizations, but that is not the case. It is called from a for_each
, and must create a new stack frame each time so that the loop can resume.
(I deleted my original first paragraph, because your use of return
from a closure is correct, just not tail-recursion.)
Rust Does Not Guarantee Tail-Call Optimization Anyway
Tail-recursive algorithms in Rust might compile to unoptimized calls that eat up the stack, and you will not be warned when this happens. I’ve generally had good results with functions that have moved or dropped all their local values that aren’t references when they tail-recurse, and therefore have no need to keep anything alive during the tail-call and drop it after, but this will not be reliable in Rust, unless the language changes.
Indeed, testing with Godbolt shows that this call is not optimized as a tail-call.
Graydon Hoare has called his announcement “one of the saddest posts ever written on the subject,” and later said that this was one of the times he was outvoted on the design.
The List Should be the Return Value
It does not make much sense to require the caller to pass in &mut Vec::new()
so the function can fill it. Just create the Vec
yourself and pass it back. This allows callers to chain and compose this function, and is one less thing that can go wrong with the public API.
If I were writing a tail-recursive function with an accumulator in a functional language, I would probably write the version with the accumulator as a private helper, have the public function call it with the accumulator set to 0/an empty list/whatever, and return the folded value. This keeps the benefits of immutable data, but in a language like Haskell, it’s also the only real option.
In Rust, it can sometimes be an optimization to pass around references instead of moving parameters, but that is not as true for an efficiently-moveable type like Vec
. If the compiler is able to optimize out tail-recursion, it is also able to optimize out a move from a parameter to itself.
Refactor with Either a for
Loop or an Iterator
If You Use a for
Loop
This is the simplest solution. Create a new mut Vec<String>
. For each element in the list, either push
the name of the regular file to the result Vec
, or append
the list returned by a recursive call to the function.
If You Use Iterators
As an alternative to for_each
, you could write a quasi-functional solution here. Write a helper function that turns a DirEntry
into either a sequence of all files within a recursive descent of the directory, or a singleton of its filename. Then, do a flat_map
over all sequences generated by the directory entries.
Iterators often lead to elegant, fluent code, but here you have to pick your poison. Each directory item adds either no names (if invalid), one name (if it is a regular file or symlink) or at least one but possibly more (if it is a directory). That means you will have to convert the result of all three cases into some common type. One of them, a recursive call, will return a Vec<String>
.
You can therefore:
- Represent all three cases as a
Vec<String>
, whose length might be 0 or 1, and do a lot of tiny heap allocations for all the singletons. - Convert all three cases into a
Box<dyn Iterator<Item = String>>
and use runtime polymorphism, creating many tiny objects on the heap. - Convert all three cases into a
&[String]
without using the heap, but now be forced to clone
each borrowed &String
.
An implementation that does the first of these (which should be simpler and have less overhead than the other two):
use std::fs::{self, DirEntry}; pub fn traverse_directory(directory: &str) -> Vec<String> { fn dirent_helper(entry: DirEntry) -> Vec<String> { let filetype = entry.file_type().expect("Invalid file type"); if filetype.is_file() || filetype.is_symlink() { vec!{entry.path().to_str().expect("Invalid filename").to_string()} } else if filetype.is_dir() { traverse_directory(entry.path().to_str().expect("Invalid directory")) } else { Vec::new() } } fs::read_dir(directory) .expect("Error reading directory") .flat_map(|entry| dirent_helper(entry.expect("Invalid directory entry"))) .collect() } pub fn main() { println!("{:?}", traverse_directory(".")); }
For simplicity, this panics on any error (including not having read permission on any directory), but you can change that behavior to return a std::io::Result
(perhaps with the ?
operator) or silently ignore them and return an empty list. It also does not return the names of empty directories, which you might want it to.
You Could Also fold
Since you have a sequence of DirEntry
objects and want to accumulate a list of pathnames, a better way to do that might be as a fold
that appends to the accumulator. This mixes in some non-functional operations in Rust (since Vec::append
and Vec::push
are implemented as side-effect functions on mut
data).
use std::fs::{self, DirEntry}; pub fn traverse_directory(directory: &str) -> Vec<String> { fn dirent_helper(mut list: Vec<String>, entry: DirEntry) -> Vec<String> { let filetype = entry.file_type().expect("Invalid file type"); if filetype.is_file() || filetype.is_symlink() { list.push(entry.path().to_str().expect("Invalid filename").to_string()) } else if filetype.is_dir() { list.append(&mut traverse_directory( entry.path().to_str().expect("Invalid directory"))) } list } fs::read_dir(directory) .expect("Error reading directory") .map(|entry| entry.expect("Invalid directory entry")) .fold(Vec::new(), dirent_helper) } pub fn main() { println!("{:?}", traverse_directory(".")); }
This is a nearly-zero-cost abstraction for what you were doing. (See for yourself on Godbolt.) The path that does not recursively descend compiles to a loop with no temporary objects created on the heap. All the gory details stay inside the function, hidden from the caller. However, there is one unnecessary move of list
per iteration, costing four instructions on X86-64. (This is a missed optimization: since dirent_helper
is a local function that cannot be called from anywhere else, the compiler could have passed list
in the same registers it used for the return value, and could then have modified it in-place.)
Don’t Shadow so Many Names
You re-use entry
something like three times in a row. This is confusing to a maintainer, and also runs the risk of a typo making the compiler think it should use an earlier version.