Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported

I try to load a dataset using the datasets python module in my local Python Notebook. I am running a Python 3.10.13 kernel as I do for my virtual environment.

I cannot load the datasets I am following from a tutorial. Here’s the error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/Users/ari/Downloads/00-fine-tuning.ipynb Celda 2 line 3
      1 from datasets import load_dataset
----> 3 data = load_dataset(
      4     "jamescalam/agent-conversations-retrieval-tool",
      5     split="train"
      6 )
      7 data

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/load.py:2149, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   2145 # Build dataset for splits
   2146 keep_in_memory = (
   2147     keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
   2148 )
-> 2149 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
   2150 # Rename and cast features to match task schema
   2151 if task is not None:
   2152     # To avoid issuing the same warning twice

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/builder.py:1173, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1171 is_local = not is_remote_filesystem(self._fs)
   1172 if not is_local:
-> 1173     raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1174 if not os.path.exists(self._output_dir):
   1175     raise FileNotFoundError(
   1176         f"Dataset {self.dataset_name}: could not find data in {self._output_dir}. Please make sure to call "
   1177         "builder.download_and_prepare(), or use "
   1178         "datasets.load_dataset() before trying to access the Dataset object."
   1179     )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

How do I resolve ? I don’t understand how this error is applicable, given that the dataset is something I am fetching and thus cannot be cached in my LocalFileSystem in the first place.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try doing:

pip install -U datasets

This error stems from a breaking change in fsspec. It has been fixed in the latest datasets release (2.14.6). Updating the installation with pip install -U datasets should fix the issue.

git link : https://github.com/huggingface/datasets/issues/6352


if you are using fsspec then do:

pip install fsspec==2023.9.2

There is a problem with fsspec==2023.10.0

git link : https://github.com/huggingface/datasets/issues/6330

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading