Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Problem when I am trying to load txt in Jupyter Notebook

Im trying to load all txt files from a folder.
This code below works most of time when I want to load txt files to pandas dataframe and concatenate them, but in this case is not working and I don’t know why.

Here is the code:

path = 'C:/Users/user/Documents/UNIAO'


csv_files = glob.glob(os.path.join(path, "*.txt"))

list_of_dataframes = []
# loop over the list of csv files
for f in csv_files:
    text_file = open(f, "r", encoding='unicode_escape')

    data = text_file.read()
    separator= data[4]

    df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
    list_of_dataframes.append(df)

Here is the error message:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ParserError                               Traceback (most recent call last)
Cell In [5], line 19
     16 separator = data[4]
---> 19 df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
     20 print(f)
     23 list_of_dataframes.append(df)

File c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
...
--> 739     raise ParserError(msg)
    740 elif self.on_bad_lines == self.BadLineHandleMethod.WARN:
    741     base = f"Skipping line {row_num}: "

ParserError: Expected 197 fields in line 11955, saw 198

>Solution :

This issue could be due to some corrupted/missing data on line 11955, you could try,

For Pandas >= 1.3.0

df = pd.read_csv(f, sep=separator, encoding =’unicode_escape’, on_bad_lines=’skip’)

For Pandas < 1.3.0

df = pd.read_csv(f, sep=separator, encoding =’unicode_escape’, error_bad_lines=False)

Do note that this will cause the offending lines to be skipped.

For more information refer Pandas documentation

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading