Generating prime numbers quickly in rust

Question

I am attempting to re-implement a postponed sieve algorithm for generating prime numbers in Rust. I am able to make a solution that works, but I have to use a couple of .clone() calls which I believe are killing my performance (the Rust solution ends up ~8x slower than the Python solution).

I would love some advice on how I can avoid the .clone() calls while avoiding errors from the borrow checker.

use std::collections::HashMap; #[derive(Debug, Clone)] struct Primes { i: usize, curr_candidate: u64, next_relevant_prime: u64, next_relevant_prime_squared: u64, sieve: HashMap<u64, u64>, initial_primes: Vec<u64>, internal_primes: Box<Option<Primes>>, } impl Primes { fn new() -> Primes { Primes { i: 0, curr_candidate: 7, next_relevant_prime: 0, next_relevant_prime_squared: 0, sieve: HashMap::new(), initial_primes: vec![2, 3, 5, 7], internal_primes: Box::new(None), } } } impl Iterator for Primes { type Item = u64; fn next(&mut self) -> Option<Self::Item> { let len = self.initial_primes.len(); let mut internal_primes; if self.i < len { self.i += 1; return Some(self.initial_primes[self.i - 1]); } else if self.i == len { self.i += 1; internal_primes = Primes::new(); self.internal_primes = Box::new(Some(internal_primes.clone())); internal_primes.next(); // skip 2 self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } else { internal_primes = self.internal_primes.clone().unwrap(); } let mut i = self.curr_candidate; loop { i += 2; let step; if self.sieve.contains_key(&i) { // composite step = self.sieve.remove(&i).unwrap(); } else if i < self.next_relevant_prime_squared { // prime // save state for next round self.curr_candidate = i; self.internal_primes = Box::new(Some(internal_primes)); return Some(i); } else { // i == next_relevant_prime_squared step = 2 * self.next_relevant_prime; self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } let mut j = i; j += step; while self.sieve.contains_key(&j) { j += step; } self.sieve.insert(j, step); } } } fn main() { let mut primes = Primes::new(); for _i in 0..99_999 { primes.next(); } println!("The 100,000th prime is {}", primes.next().unwrap()) }

kbrose · Accepted Answer · 2021-10-27 01:55:15Z

I was able to avoid the calls to .clone() with two changes.

First, we can avoid setting

self.internal_primes = Box::new(Some(internal_primes.clone()));

at all, and instead delay assigning internal_primes to self.internal_primes until right before the function returns. This avoids having multiple mutable references to the same Primes instance.

Second, we can use the other .clone() call by using mem::replace:

internal_primes = mem::replace(&mut self.internal_primes, Box::new(None)).unwrap();

I also realized that cargo run is running in debug mode (at least for my project), which is much slower than I would have though. Compiling for release mode ends up being much faster regardless of whether we use .clone() (although it's still about 2x faster to avoid them):

$ cargo build --release $ time ./target/release/primes The 100,000th prime is 1299709 ./target/release/primes 0.07s user 0.00s system 34% cpu 0.216 total

The final version of the code:

use std::collections::HashMap; use std::mem; #[derive(Debug, Clone)] struct Primes { i: usize, curr_candidate: u64, next_relevant_prime: u64, next_relevant_prime_squared: u64, sieve: HashMap<u64, u64>, initial_primes: Vec<u64>, internal_primes: Box<Option<Primes>>, } impl Primes { fn new() -> Primes { Primes { i: 0, curr_candidate: 7, next_relevant_prime: 0, next_relevant_prime_squared: 0, sieve: HashMap::new(), initial_primes: vec![2, 3, 5, 7], internal_primes: Box::new(None), } } } impl Iterator for Primes { type Item = u64; fn next(&mut self) -> Option<Self::Item> { let len = self.initial_primes.len(); let mut internal_primes; if self.i < len { self.i += 1; return Some(self.initial_primes[self.i - 1]); } else if self.i == len { self.i += 1; internal_primes = Primes::new(); internal_primes.next(); // skip 2 self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } else { internal_primes = mem::replace(&mut self.internal_primes, Box::new(None)).unwrap(); } let mut i = self.curr_candidate; loop { i += 2; let step; if self.sieve.contains_key(&i) { // composite step = self.sieve.remove(&i).unwrap(); } else if i < self.next_relevant_prime_squared { // prime // save state for next round self.curr_candidate = i; self.internal_primes = Box::new(Some(internal_primes)); return Some(i); } else { // i == next_relevant_prime_squared step = 2 * self.next_relevant_prime; self.next_relevant_prime = internal_primes.next().unwrap(); self.next_relevant_prime_squared = self.next_relevant_prime.pow(2); } let mut j = i; j += step; while self.sieve.contains_key(&j) { j += step; } self.sieve.insert(j, step); } } } fn main() { let mut primes = Primes::new(); for _i in 0..99_999 { primes.next(); } println!("The 100,000th prime is {}", primes.next().unwrap()) }

One more idiomatic thing would be to do primes.nth(10_000).unwrap() to get the 10,000th prime, rather than the useless loop. — PitaJ, CommentedOct 28, 2021 at 19:15

Margus · Accepted Answer · 2021-10-26 14:01:45Z

A normal Sieve of Eratosthenes would look like:

fn primes(n: usize) -> impl Iterator<Item = usize> { const START: usize = 2; if n < START { Vec::new() } else { let mut is_prime = vec![true; n + 1 - START]; let limit = (n as f64).sqrt() as usize; for i in START..limit + 1 { let mut it = is_prime[i - START..].iter_mut().step_by(i); if let Some(true) = it.next() { it.for_each(|x| *x = false); } } is_prime } .into_iter() .enumerate() .filter_map(|(e, b)| if b { Some(e + START) } else { None }) }

Starting at an offset of 2 means that an n < 2 input requires zero allocations because Vec::new() doesn't allocate memory until elements are pushed into it.
Using Vec as an output to the if .. {} else {} condition means the output is statically deterministic, avoiding the need for a boxed trait object.
Iterating is_prime with .iter_mut() and then using .step_by(i) makes all the optimizations required, and removes a lot of tediousness.
Returning impl Iterator allows for static dispatching instead of dynamic dispatching, which is possible because the type is now statically known at compile-time, making the zero input/output condition order of magnitude faster.

If you need to calculate primes as the range increases above 10's of millions then check out https://rosettacode.org/wiki/Sieve_of_Eratosthenes#Unbounded_Page-Segmented_bit-packed_odds-only_version_with_Iterator

Thank you for the answer Margus. I know that what you posted is the more standard sieve, however, I think you'll find the algorithm I posted about is more memory efficient, and faster (at least in python and Julia) than the standard sieve. I am implementing it mainly as an exercise in Rust, so I am more interested in an answer which helps me understand how to implement the algorithm I shared rather than how to implement a totally different algorithm. — kbrose, CommentedOct 26, 2021 at 15:59

Stack Exchange Network

Generating prime numbers quickly in rust

2 Answers 2

Hot Network Questions

Generating prime numbers quickly in rust

2 Answers 2

Related

Hot Network Questions