I am having trouble reading in a csv that contains a comma within a row value.
An example row including the data causing the issue (afaik) is as follows:
[‘true’,47,’y’,’descriptive_evidence’,’n’,’true’,66,[81,65]]
I think that the [81,65] entry is being scanned literally and thus treated as two entries [81 and 65]. Is there any way to override this in pandas, or do i have to manually replace the comma prior to reading into a dataframe?
From reading other answers, I am aware of the possibility of skipping rows using something like error_bad_lines=False, but in this case i cant afford to skip these entries.
Best Wishes 🙂
>Solution :
You could try sep with regex but it will be using python engine and not c and it can be memory/time consuming. Here is the solution if you would like to go with this:
1,2,3,4,5,6,7,8
'true',47,'y','descriptive_evidence','n','true',66,[81,65]
pd.read_csv("./file_name.csv",sep=r",(?![^[]*\])",engine="python")
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| --- | ------ | --- | --- | ---------------------- | --- | ------ | --- | ------- |
| 0 | 'true' | 47 | 'y' | 'descriptive_evidence' | 'n' | 'true' | 66 | [81,65] |