3
\$\begingroup\$

I am attempting to re-implement a postponed sieve algorithm for generating prime numbers in Rust. I am able to make a solution that works, but I have to use a couple of .clone() calls which I believe are killing my performance (the Rust solution ends up ~8x slower than the Python solution).

I would love some advice on how I can avoid the .clone() calls while avoiding errors from the borrow checker.

use std::collections::HashMap; #[derive(Debug, Clone)] struct Primes { i: usize, curr_candidate: u64, next_relevant_prime: u64, next_relevant_prime_squared: u64, sieve: HashMap<u64, u64>, initial_primes: Vec<u64>, internal_primes: Box<Option<Primes>>, } impl Primes { fn new() -> Primes { Primes { i: 0, curr_candidate: 7, next_relevant_prime: 0, next_relevant_prime_squared: 0, sieve: HashMap::new(), initial_primes: vec![2, 3, 5, 7], internal_primes: Box::new(None), } } } impl Iterator for Primes { type Item = u64; fn next(&mut self) -> Option<Self::Item> { let len = self.initial_primes.len(); let mut internal_primes; if self.i < len { self.i += 1; return Some(self.initial_primes[self.i - 1]); } else if self.i == len { self.i += 1; internal_primes = Primes::new(); self.internal_primes = Box::new(Some(internal_primes.clone())); internal_primes.next(); // skip 2 self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } else { internal_primes = self.internal_primes.clone().unwrap(); } let mut i = self.curr_candidate; loop { i += 2; let step; if self.sieve.contains_key(&i) { // composite step = self.sieve.remove(&i).unwrap(); } else if i < self.next_relevant_prime_squared { // prime // save state for next round self.curr_candidate = i; self.internal_primes = Box::new(Some(internal_primes)); return Some(i); } else { // i == next_relevant_prime_squared step = 2 * self.next_relevant_prime; self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } let mut j = i; j += step; while self.sieve.contains_key(&j) { j += step; } self.sieve.insert(j, step); } } } fn main() { let mut primes = Primes::new(); for _i in 0..99_999 { primes.next(); } println!("The 100,000th prime is {}", primes.next().unwrap()) } 
\$\endgroup\$

    2 Answers 2

    2
    \$\begingroup\$

    I was able to avoid the calls to .clone() with two changes.

    First, we can avoid setting

    self.internal_primes = Box::new(Some(internal_primes.clone())); 

    at all, and instead delay assigning internal_primes to self.internal_primes until right before the function returns. This avoids having multiple mutable references to the same Primes instance.

    Second, we can use the other .clone() call by using mem::replace:

    internal_primes = mem::replace(&mut self.internal_primes, Box::new(None)).unwrap(); 

    I also realized that cargo run is running in debug mode (at least for my project), which is much slower than I would have though. Compiling for release mode ends up being much faster regardless of whether we use .clone() (although it's still about 2x faster to avoid them):

    $ cargo build --release $ time ./target/release/primes The 100,000th prime is 1299709 ./target/release/primes 0.07s user 0.00s system 34% cpu 0.216 total 

    The final version of the code:

    use std::collections::HashMap; use std::mem; #[derive(Debug, Clone)] struct Primes { i: usize, curr_candidate: u64, next_relevant_prime: u64, next_relevant_prime_squared: u64, sieve: HashMap<u64, u64>, initial_primes: Vec<u64>, internal_primes: Box<Option<Primes>>, } impl Primes { fn new() -> Primes { Primes { i: 0, curr_candidate: 7, next_relevant_prime: 0, next_relevant_prime_squared: 0, sieve: HashMap::new(), initial_primes: vec![2, 3, 5, 7], internal_primes: Box::new(None), } } } impl Iterator for Primes { type Item = u64; fn next(&mut self) -> Option<Self::Item> { let len = self.initial_primes.len(); let mut internal_primes; if self.i < len { self.i += 1; return Some(self.initial_primes[self.i - 1]); } else if self.i == len { self.i += 1; internal_primes = Primes::new(); internal_primes.next(); // skip 2 self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } else { internal_primes = mem::replace(&mut self.internal_primes, Box::new(None)).unwrap(); } let mut i = self.curr_candidate; loop { i += 2; let step; if self.sieve.contains_key(&i) { // composite step = self.sieve.remove(&i).unwrap(); } else if i < self.next_relevant_prime_squared { // prime // save state for next round self.curr_candidate = i; self.internal_primes = Box::new(Some(internal_primes)); return Some(i); } else { // i == next_relevant_prime_squared step = 2 * self.next_relevant_prime; self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } let mut j = i; j += step; while self.sieve.contains_key(&j) { j += step; } self.sieve.insert(j, step); } } } fn main() { let mut primes = Primes::new(); for _i in 0..99_999 { primes.next(); } println!("The 100,000th prime is {}", primes.next().unwrap()) } 
    \$\endgroup\$
    1
    • \$\begingroup\$One more idiomatic thing would be to do primes.nth(10_000).unwrap() to get the 10,000th prime, rather than the useless loop.\$\endgroup\$
      – PitaJ
      CommentedOct 28, 2021 at 19:15
    0
    \$\begingroup\$

    A normal Sieve of Eratosthenes would look like:

    fn primes(n: usize) -> impl Iterator<Item = usize> { const START: usize = 2; if n < START { Vec::new() } else { let mut is_prime = vec![true; n + 1 - START]; let limit = (n as f64).sqrt() as usize; for i in START..limit + 1 { let mut it = is_prime[i - START..].iter_mut().step_by(i); if let Some(true) = it.next() { it.for_each(|x| *x = false); } } is_prime } .into_iter() .enumerate() .filter_map(|(e, b)| if b { Some(e + START) } else { None }) } 
    1. Starting at an offset of 2 means that an n < 2 input requires zero allocations because Vec::new() doesn't allocate memory until elements are pushed into it.
    2. Using Vec as an output to the if .. {} else {} condition means the output is statically deterministic, avoiding the need for a boxed trait object.
    3. Iterating is_prime with .iter_mut() and then using .step_by(i) makes all the optimizations required, and removes a lot of tediousness.
    4. Returning impl Iterator allows for static dispatching instead of dynamic dispatching, which is possible because the type is now statically known at compile-time, making the zero input/output condition order of magnitude faster.

    If you need to calculate primes as the range increases above 10's of millions then check out https://rosettacode.org/wiki/Sieve_of_Eratosthenes#Unbounded_Page-Segmented_bit-packed_odds-only_version_with_Iterator

    \$\endgroup\$
    1
    • 3
      \$\begingroup\$Thank you for the answer Margus. I know that what you posted is the more standard sieve, however, I think you'll find the algorithm I posted about is more memory efficient, and faster (at least in python and Julia) than the standard sieve. I am implementing it mainly as an exercise in Rust, so I am more interested in an answer which helps me understand how to implement the algorithm I shared rather than how to implement a totally different algorithm.\$\endgroup\$
      – kbrose
      CommentedOct 26, 2021 at 15:59

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.