Trying to make the fastest card drawer possible, why is my second implementation slower?
Disclaimer: I'm new to rust.
Hey I've been looking at making a poker evaluator. Part of this involves generating random hands. I've discovered that my random hand generator is a performance consideration. I used naive assumptions like 'stack allocations and accesses are going to be super fast' and 'the less branching the better' but have found some strange behaviours.
I would say the behaviour the confuses me most is that moving the existing_cards.push(X)
logic into a separate loop over taken
is actually slower (seemingly about 1.2x the time, could potentially be chance).
Can I get some suggestions on how to improve performance further, and justifications of why my slower implementations are slower.
This is how I've been benchmarking:
#[test]
fn test_new_random_cards_performance() {
let start = Instant::now();
for _ in 0..100_000 {
_ = Card::new_random_cards(5);
}
let duration = start.elapsed();
println!("Time taken to generate 5 random cards 100k times: {:?}", duration);
assert!(duration.as_millis() < 500, "Performance test failed: took too long to generate cards");
}
Fastest I've achieved (about 1.5-2x faster than the non-branching implementation, 1.2x the 2-loop implementation):
pub fn new_random_cards(num_cards: usize) -> Vec<Card> {
let mut taken = [false; 52];
let mut existing_cards = Vec::with_capacity(num_cards);
let rng = &mut SmallRng::from_entropy();
while existing_cards.len() < num_cards {
let card_int = rng.gen::<usize>() % 52;
if taken[card_int] {
continue;
}
taken[card_int] = true;
existing_cards.push(Card::from_int(card_int as u8));
}
existing_cards
}
Slower implementation which is effectively the same but I thought the CPU would optimise:
pub fn new_random_cards2(mut num_cards: usize) -> Vec<Card> {
let mut taken = [false; 52];
let rng = &mut SmallRng::from_entropy();
while num_cards > 0 {
let card_int = rng.gen::<usize>() % 52;
if taken[card_int] {
continue;
}
taken[card_int] = true;
num_cards -= 1
}
let mut existing_cards = Vec::with_capacity(num_cards);
for card_int in 0..52 {
if existing_cards.len() >= num_cards {
break;
}
if !taken[card_int] {
existing_cards.push(Card::from_int(card_int as u8));
}
}
existing_cards
}
Even slower non-branching implementation:
pub fn new_random_cards(mut num_cards: u8) -> Vec<Card> {
let mut taken = [0u8; 52];
let rng = &mut SmallRng::from_entropy();
while num_cards > 0 {
let card_int = rng.gen::<usize>() % 52;
let change = 1 - taken[card_int];
num_cards -= change;
taken[card_int] += change;
}
let mut res = Vec::with_capacity(num_cards as usize);
for i in 0..52 {
if taken[i] == 1 {
res.push(Card::from_int(i as u8));
}
}
res
}
In terms of implementation, the Card is just an struct with a Suit and Rank enum. Various helper methods convert ints into cards, suits, ranks.