Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fastest "trivial" way of shuffling a vector

I am working on a chess engine for some time now. For improving the engine, I wrote some code which loads chess-positions from memory into some tuner code. I have around 1.85B fens on my machine which adds up to 40Gb (24B per position).

After loading, I end up with a vector of positions:

struct Position{
   std::bitset<8*24> bits{};
}

void main(){
    std::vector<Position> positions{};
   
    // mimic some data loading
    for(int i = 0; i < 1.85e9; i++){
        positions.push_back(Position{})
    }    

    // ...
}

The data is organised in the following way:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The positions are taken from games where the positions are seperated by just a few moves. Usually about 40-50 consecutive moves come the same game / line and are therefor somewhat equal.

Eventually I will read 16384 position within a single batch and ideally none of those positions come from the same game. Therefor I do some initial sorting before using the data.

My current shuffling method is this:

    auto rng = std::default_random_engine {};
    std::shuffle(std::begin(positions), std::end(positions), rng);

Unfortunately this takes quiet some time (about 1-2 minutes). Since I dont require perfect shuffles, I assume that some easier shuffles exist.

My second aproach was:

    for(int i = 0; i < positions.size(); i++){
        std::swap(positions[i], positions[(i*16384) % positions.size()]);
    }

which will ensure that there are not going to be positions coming from the same game within a single batch and are evenly spaces by 16384 entries.

I was wondering if there is some even simpler, faster solution. Especially considering that the modulo-operator requires quiet some clock cycles.

I am happy for any "trivial" solution.

Greetings
Finn

>Solution :

There is a tradeoff to be made: Shuffling a a std::vector<size_t> of indices can be expected to be cheaper than shuffling a std::vector<Position> at the cost of an indirection when accessing the Positions via shuffled indices. Actually the example on cppreference for std::iota is doing something along that line (it uses iterators):

#include <algorithm>
#include <iostream>
#include <list>
#include <numeric>
#include <random>
#include <vector>
 
int main()
{
    std::list<int> l(10);
    std::iota(l.begin(), l.end(), -4);
 
    std::vector<std::list<int>::iterator> v(l.size());
    std::iota(v.begin(), v.end(), l.begin());
 
    std::shuffle(v.begin(), v.end(), std::mt19937{std::random_device{}()});
 
    std::cout << "Contents of the list: ";
    for(auto n: l) std::cout << n << ' ';
    std::cout << '\n';
 
    std::cout << "Contents of the list, shuffled: ";
    for(auto i: v) std::cout << *i << ' ';
    std::cout << '\n';
}

Instead of shuffling the list directly, a vector of iterators (with a std::vector indices woud work as well) is shuffled and std::shuffle only needs to swap iterators (/indices) rather than the more costly actual elements (in the example the "costly to swap" elements are just ints).

For a std::list I don’t expect a big difference between iterating in order or iterating via shuffled iterators. On the other hand, for a std::vector I do expect a significant impact. Hence, I would shuffle indices, then rearrange the vector once, and profile to see which performs better.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading