4
\$\begingroup\$

I'm new to rust, and this is my very first rust program.

I'd like to write a simple CLI app to help me running some Git command on multiple repositories. Since my goal is learning rust, I force myself not to use any dependency so that I have to implement the tools I need. I started writing a very small API for parsing command line arguments. My code builds and seems to work as expected on simple cases. I'd like to have some advice on those few lines: what's bad in this code, what should I change right now? The things I've struggled (just to make the code build):

Lifetime It looks like that when doing data structure composition, I have to propagate lifetime nearly everywhere as soon as I use references to avoid unnecessary copy. Am I right to worry about this, or should I just get use to this when coding with rust? Could it be better/simpler than what I did?

borrow checker In 'command_line_parse' I have two vectors as parameter.

  • The first one is simply the command line arguments to analyse.
  • The second one is a list defining each command (with argument) and options I'd like to handle.

The function return a new vector with references to the commands and options definition found in command line. A command may also have a parameter (next to the command) ; but if the next element after a command have been identified as another command or option, the command has no parameter.

In the last for of commandline.rs, I had to build and use a temporary vector 'index_match': Since I already have a mutable reference to iterate vector 'result', I wasn't able to search directly in 'result' for 'index_command_param' in a nested loop because of the borrow checker. Could I avoid building 'index_match' temporary vector to do this ?

main.rs

mod commandline; use crate::commandline::*; use std::env; fn main() { let syntax: Vec<ArgumentDefinition> = vec![ ArgumentDefinition { short: Some("-l"), long: Some("--list"), kind: ArgumentKind::Option, expect: None, description: "list repositories.", }, ArgumentDefinition { short: Some("-p"), long: Some("--pull"), kind: ArgumentKind::Option, expect: None, description: "pull repositories branch.", }, ArgumentDefinition { short: Some("-s"), long: Some("--switch"), kind: ArgumentKind::Command, expect: Some("branch"), description: "switch repositories branch.", }, ]; let arguments: Vec<String> = env::args().collect(); let result = command_line_parse(&arguments, &syntax); dbg!(result); } 

commandline.rs

pub type ArgumentIndex = usize; #[derive(Debug)] #[allow(dead_code)] pub enum ArgumentKind { Command, Option, } #[derive(Debug)] pub struct ArgumentDefinition<'a> { pub short: Option<&'a str>, pub long: Option<&'a str>, pub kind: ArgumentKind, pub expect: Option<&'a str>, pub description: &'a str, } #[derive(Debug)] pub struct ArgumentValue<'a> { pub index: ArgumentIndex, pub parameter: Option<&'a str>, } #[derive(Debug)] pub struct ArgumentParseResult<'a> { pub result: ArgumentValue<'a>, pub argument: &'a ArgumentDefinition<'a>, } fn option_seek( arguments: &Vec<String>, pattern_short: Option<&str>, pattern_long: Option<&str>, ) -> Option<usize> { let closure_argument_search = |arguments: &Vec<String>, patern: Option<&str>| -> Option<usize> { if let Some(patern_value) = patern { const SHIFT: usize = 1; for (current_index, current_value) in arguments[SHIFT..].iter().enumerate() { if current_value.eq(patern_value) { return Some(current_index + SHIFT); } } } None }; if let Some(index) = closure_argument_search(&arguments, pattern_short) { return Some(index); } if let Some(index) = closure_argument_search(&arguments, pattern_long) { return Some(index); } None } pub fn command_line_parse<'a>( arguments: &'a Vec<String>, definition: &'a Vec<ArgumentDefinition>, ) -> Vec<ArgumentParseResult<'a>> { let mut result: Vec<ArgumentParseResult> = Vec::new(); let mut index_match: Vec<ArgumentIndex> = Vec::new(); for argument_definition in definition { if let Some(index) = option_seek( arguments, argument_definition.short, argument_definition.long, ) { index_match.push(index); let argument_value = ArgumentValue { index: index, parameter: None, }; let argument_parse_result = ArgumentParseResult { result: argument_value, argument: &*argument_definition, }; result.push(argument_parse_result); } } for result_element in &mut result { match result_element.argument.kind { ArgumentKind::Command => { let index_command_param = result_element.result.index + 1; if index_match .iter() .position(|&r| r == index_command_param) .is_none() { result_element.result.parameter = Some(&arguments[index_command_param]) } } ArgumentKind::Option => {} } } return result; } 
\$\endgroup\$
4
  • 1
    \$\begingroup\$I don't agree with the mindset that you learn a language by reinventing the wheel. On the contrary. When programming it is also necessary to aquire knowledge about the ecosystem and thus quasi-standard packages that are readily available to accomplish common tasks.\$\endgroup\$CommentedMar 27, 2024 at 12:44
  • \$\begingroup\$When I've learned data structure and algorithm at school, I had a lot of exercise to implement: double linked list, hash map, ... There are libraries to do this in any languages, but the exercise is useful.\$\endgroup\$
    – Marcus
    CommentedMar 27, 2024 at 19:24
  • \$\begingroup\$To learn the concepts behind those data structures, yes. But not in order to learn the respective language.\$\endgroup\$CommentedMar 27, 2024 at 19:46
  • \$\begingroup\$I did the same in C++ many times and it really helped me to improve my skills. Implementing myself some data structure before studying some libs source code like boost was a better time investment for me than just reading boost source code. We don't necessary share the same most effective methods to learn stuff and I know what's mine.\$\endgroup\$
    – Marcus
    CommentedMar 27, 2024 at 20:13

1 Answer 1

2
\$\begingroup\$

General

The code is well laid out, and there is a clear separation of responsibilities between the various functions.

This makes it easy to review, thanks!

Documentation?

There is no documentation, no specification of behavior, and therefore it's up to anyone's guess what the function should or should not do.

For example, there is no check that two distinct specified arguments should have different flags. What happens if they don't?

Tests?

Similarly, there is no test.

It's incredibly easy to have tests in Rust, just write them at the bottom of the file:

#[cfg(test)] // The following sub-module is only compiled in test mode. mod test { use super::*; #[test] #[should_panic] fn panic_on_duplicate_flags() { // Write test here. } } // mod test 

Non-UTF8 arguments

Your parser does not handle non UTF-8 arguments, by virtue of using String.

It's possible to use OsString instead (std::env::os_args) to have more graceful behavior. In particular, filenames may not be UTF-8.

It's also reasonable to just stick to String, but it'd be worth documenting the limitation so users are aware of it.

Non-idiomatic arguments

In general, Rust code tends to aim for maximum applicability.

There is nothing in option_seek that requires a Vec: you just need a slice! Therefore, arguments should be passed as &[String], not &Vec[String].

It's such a common mistake that it's built into Clippy, the built-in linter. Just run cargo clippy on your project (and cargo clippy --all-targets to cover tests and examples as well) and it'll help you improve your code.

Algorithmic Complexity

The current algorithmic complexity of your parser is O(passed-args * specified-args), while in general it's expected that parsers can be O(passed-args).

The expected complexity is generally accomplished by creating a map flag -> action and then consuming the arguments with it.

Memory Allocations

Your code does a number of memory allocations, or leads to generating memory allocations.

First of all, it's unnecessary to collect the command line arguments in a Vec, you can just pass the iterator.

Secondly, it's unnecessary to ask for &Vec<ArgumentDefinition> when a slice would do, and thus it's unnecessary to use a Vec for the specification when the number of arguments is known at compile-time.

Thirdly, you should give the String (or OsString) back to the user in the result instead of borrowing it then throwing it away: the user may have use for it!

Fourthly... if you asked the user to provide callbacks, you could do away with the resulting Vec as well!

Example: allocation-less parser (or close to!)

See a sketch of a memory-less parser on the Playground.

(Well, not quite allocation-less, as I use HashMap for lookups)

use std::collections::HashMap; pub struct Optional<'a, T, E> { short: Option<char>, long: Option<&'a str>, description: &'a str, action: Action<T, E>, } pub type Action<T, E> = fn(&mut dyn Iterator<Item = String>, &mut T) -> Result<(), E>; pub fn parse_arguments<'a, T, E, I>( result: &mut T, arguments: I, options: &[Optional<'a, T, E>], positional: fn(String, &mut T) -> Result<(), E>, supplemental: fn(String, &mut T) -> Result<(), E>, ) -> Result<(), ArgumentParseError<'a, E>> where I: IntoIterator<Item = String>, { let lookup = Lookup::new(options)?; let mut arguments = arguments.into_iter(); // Discard first argument, typically the name of the program. arguments.next(); // Parse the actual arguments. while let Some(argument) = arguments.next() { // Any argument passed after '--' is an argument for a subprogram. if argument == "--" { break } if let Some(long) = argument.strip_prefix("--") { let Some((_, action)) = lookup.longs.get(long) else { return Err(ArgumentParseError::UnknownLongFlag(argument)); }; action(&mut arguments, result)?; continue; } if let Some(shorts) = argument.strip_prefix('-') { for short in shorts.chars() { let Some((_, action)) = lookup.shorts.get(&short) else { return Err(ArgumentParseError::UnknownShortFlag(short)); }; action(&mut arguments, result)?; } } positional(argument, result)?; } for argument in arguments { supplemental(argument, result)?; } Ok(()) } #[derive(Debug)] pub enum ArgumentParseError<'a, E> { DuplicateShortFlag(char, &'a str, &'a str), DuplicateLongFlag(&'a str, &'a str, &'a str), UnknownShortFlag(char), UnknownLongFlag(String), User(E), } impl<'a, E> From<E> for ArgumentParseError<'a, E> { fn from(user: E) -> Self { Self::User(user) } } // TODO: implement Display and Error. // // Implementation details // struct Lookup<'a, T, E> { shorts: HashMap<char, (&'a str, Action<T, E>)>, longs: HashMap<&'a str, (&'a str, Action<T, E>)>, } impl<'a, T, E> Lookup<'a, T, E> { fn new(specification: &[Optional<'a, T, E>]) -> Result<Self, ArgumentParseError<'a, E>> { let mut shorts = HashMap::with_capacity(specification.len()); let mut longs = HashMap::with_capacity(specification.len()); for argument in specification { if let Some(short) = argument.short { let previous = shorts.insert(short, (argument.description, argument.action)); if let Some(previous) = previous { return Err(ArgumentParseError::DuplicateShortFlag(short, previous.0, argument.description)); } } if let Some(long) = argument.long { let previous = longs.insert(long, (argument.description, argument.action)); if let Some(previous) = previous { return Err(ArgumentParseError::DuplicateLongFlag(long, previous.0, argument.description)); } } } Ok(Self { shorts, longs, }) } } ```
\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.