Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can a PyTorch DataLoader start with an empty dataset?

I have a dataset which is in a deque buffer, and I want to load random batches from this with a DataLoader. The buffer starts empty. Data will be added to the buffer before the buffer is sampled from.

self.buffer = deque([], maxlen=capacity)
self.batch_size = batch_size
self.loader = DataLoader(self.buffer, batch_size=batch_size, shuffle=True, drop_last=True)

However, this causes the following error:

  File "env/lib/python3.8/site-packages/torch_geometric/loader/dataloader.py", line 78, in __init__
    super().__init__(dataset, batch_size, shuffle,
  File "env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 268, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "env/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 102, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

Turns out that the RandomSampler class checks that num_samples is positive when it is initialised, which causes the error.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

if not isinstance(self.num_samples, int) or self.num_samples <= 0:
    raise ValueError("num_samples should be a positive integer "
                     "value, but got num_samples={}".format(self.num_samples))

Why does it check for this here, even though RandomSampler does support datasets which change in size at runtime?

One workaround is to use an IterableDataset, but I want to use the shuffle functionality of DataLoader.

Can you think of a nice way to use a DataLoader with a deque? Much appreciated!

>Solution :

The problem here is neither the usage of deque nor the fact that the dataset is dynamically growable. The problem is that you are starting with a Dataset of size zero – which is invalid.

The easiest solution would be to just start with any arbitrary object in the deque and dynamically remove it afterwards.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading