2
\$\begingroup\$

Background

Traversing the file hierarchy, going through files and sub-directories from a directory, is a quite common task when doing file system search operations of some sorts. I have already encountered multiple applications where I need to do this as a part of my solution, so I figured I should put this code in a separate library.

Goal

I am trying to write a very small library for file and directory traversal, where simplicity and performance is favored over feature richness. My intention is however that the design of this library should not limit any programmer using it for their own application, which why I have made the FileWalker implement the Iterator trait.

With file and directory traversal I mean, that I should for a directory with a given path, be able to iterate through all files and sub-directories in that directory, and recursively content of sub-directories. This should be repeated until all files and directories have been "consumed", or until a maximum recursion depth has been reached.

Code

The full project can be found at GitHub, current commit at time of writing is a7c84e27. All Rust source code is however included in this post.

use std::collections::VecDeque; use std::fs; use std::fs::{Metadata, ReadDir}; use std::path::PathBuf; #[derive(Default)] pub struct FileWalker { files: VecDeque<PathBuf>, dirs: VecDeque<PathBuf>, origin: PathBuf, max_depth: u32, follow_symlinks: bool, } impl FileWalker { /// Create a new FileWalker starting from the current directoty (path `.`). /// This FileWalker will not follow symlinks and will not have any limitation /// in recursion depth for directories. pub fn new() -> FileWalker { FileWalker::for_path(&PathBuf::from("."), std::u32::MAX, false) } /// Create a new FileWalker for the given path, while also specifying the /// max recursion depth and if symlinks should be followed or not. /// /// With a directory structure of /// /// ```yaml /// test_dirs: /// - file0 /// sub_dir: /// - file1 /// - file2 /// ``` /// /// the FileWalker should return the files as following /// ``` /// use std::path::PathBuf; /// /// let path = PathBuf::from("test_dirs"); /// let max_depth: u32 = 100; /// let follow_symlinks: bool = false; /// let mut walker = walker::FileWalker::for_path(&path, max_depth, follow_symlinks); /// /// assert_eq!(Some(PathBuf::from("test_dirs/file0").canonicalize().unwrap()), walker.next()); /// assert_eq!(Some(PathBuf::from("test_dirs/sub_dir/file2").canonicalize().unwrap()), walker.next()); /// assert_eq!(Some(PathBuf::from("test_dirs/sub_dir/file1").canonicalize().unwrap()), walker.next()); /// assert_eq!(None, walker.next()); /// ``` pub fn for_path(path: &PathBuf, max_depth: u32, follow_symlinks: bool) -> FileWalker { if !path.is_dir() { panic!("Path is not a directory: {:?}", path); } let mut dirs = VecDeque::with_capacity(1); dirs.push_back(path.clone()); let files = VecDeque::with_capacity(0); FileWalker { files, dirs, origin: path.clone(), max_depth, follow_symlinks, } } fn load(&self, path: &PathBuf) -> Result<(Vec<PathBuf>, Vec<PathBuf>), std::io::Error> { let path: ReadDir = read_dirs(&path)?; let (files, dirs) = path .filter_map(|p| p.ok()) .map(|p| p.path()) .filter(|p: &PathBuf| self.follow_symlinks || !is_symlink(p)) .filter(is_valid_target) .partition(|p| p.is_file()); Ok((files, dirs)) } fn push(&mut self, path: &PathBuf) { match self.load(path) { Ok((files, dirs)) => { self.files.extend(files); let current_depth: u32 = self.depth(path) as u32; if current_depth < self.max_depth { self.dirs.extend(dirs); } } Err(e) => log::warn!("{}: {:?}", e, path), } } fn depth(&self, dir: &PathBuf) -> usize { let comps0 = self.origin.canonicalize().unwrap().components().count(); let comps1 = dir.canonicalize().unwrap().components().count(); comps1 - comps0 } } impl Iterator for FileWalker { type Item = PathBuf; fn next(&mut self) -> Option<Self::Item> { match self.files.pop_front() { Some(f) => Some(f), None => match self.dirs.pop_front() { Some(d) => { self.push(&d); self.next() } None => None, }, } } } fn read_dirs(path: &PathBuf) -> Result<ReadDir, std::io::Error> { let full_path: PathBuf = path.canonicalize()?; Ok(fs::read_dir(full_path)?) } fn is_valid_target(path: &PathBuf) -> bool { let metadata: Metadata = path.metadata().expect("Unable to retrieve metadata:"); metadata.is_file() || metadata.is_dir() } fn is_symlink(path: &PathBuf) -> bool { match path.symlink_metadata() { Ok(sym) => sym.file_type().is_symlink(), Err(err) => { log::warn!("{}: {:?}", err, path); false } } } #[cfg(test)] mod tests { use crate::FileWalker; use std::path::PathBuf; const TEST_DIR: &str = "test_dirs"; #[test] fn test_depth_only_root_dir() { let dir = PathBuf::from(TEST_DIR); let found = FileWalker::for_path(&dir, 0, false).count(); assert_eq!(1, found); } #[test] fn test_depth_one() { let dir = PathBuf::from(TEST_DIR); let found = FileWalker::for_path(&dir, 1, false).count(); assert_eq!(3, found); } } 

Concerns & Priorities

  • Performance - I do not have a lot of experience or know-how when it comes to performance and performance optimization in Rust. Any feedback on how I improve performance is most welcome.
  • Testing - I have written two test cases, and included some doctest in the documentation for FileWalker::for_path, but I am not sure if I should consider this enough. I would have preferred a setup where I could run these test cases without actual files being present on the hard drive or a part of the project when running tests, I don't know how realistic that is. Maybe it is more job than it is worth. The files used for testing can be found in the GitHub repo.
  • Documentation - Is something unclear? I have only documented public methods, should make some further documentation?

Other feedback and input is of course most welcome as well!

\$\endgroup\$

    0

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.