Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Read a CSV file that has multiple different CSV's in it

I have a .csv file but it really contains 5 separate CSV’s in it and this is causing some issues for me.

I’m currently trying to read the file like this:
data = pd.read_csv("./PA_Logs/summary.csv", skiprows=1)

But this results in an error saying the Expected fields didn’t match the number of fields present, which makes sense since the CSV’s have a different number of fields.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve also tried:
data = pd.read_csv("./PA_Logs/summary.csv", skiprows=[1,9:])
In an attempt to just get the first table but I get a syntax error.

I would either like to read the CSV as 5 different files, one for the top section and one for each of the bottom sections.

This file is produced by another piece of software that cannot be changed so I can’t do anything about the file structure.

File Structure:

RUN_ID_HERE
Level,Yield,Projected Yield,Aligned,Error Rate,Intensity C1,%>=Q30
Read 1,1.31,1.31,23.92,1.40,170,91.09
Read 2 (I),0.12,0.12,0.00,nan,396,61.56
Read 3 (I),0.12,0.12,0.00,nan,319,91.23
Read 4,2.01,2.01,26.34,2.08,145,88.46
Total,8.00,8.00,24.12,1.65,274,81.46

Read 1
Lane,Surface,Tiles,Density,Cluster PF,Legacy Phasing/Prephasing Rate,Phasing  slope/offset,Prephasing slope/offset,Reads,Reads PF,%>=Q30,Yield,Cycles Error,Aligned,Error,Error (35),Error (75),Error (100),Intensity C1
1,-,38,537 +/- 10,95.79 +/- 0.52,0.156 / 0.141,nan / nan,nan / nan,13.59,13.02,93.19,3.91,300,29.32 +/- 1.21,1.80 +/- 0.05,0.83 +/- 0.12,0.51 +/- 0.06,0.48 +/- 0.10,174 +/- 16
1,1,19,537 +/- 11,96.18 +/- 0.34,0.128 / 0.149,nan / nan,nan / nan,6.84,6.58,92.93,1.97,-,30.31 +/- 0.73,1.80 +/- 0.05,0.77 +/- 0.11,0.47 +/- 0.06,0.44 +/- 0.05,189 +/- 3
1,2,19,537 +/- 10,95.41 +/- 0.35,0.184 / 0.134,nan / nan,nan / nan,6.75,6.44,93.46,1.93,-,28.32 +/- 0.63,1.80 +/- 0.06,0.90 +/- 0.09,0.54 +/- 0.05,0.52 +/- 0.12,159 +/- 6
Read 2 (I)
Lane,Surface,Tiles,Density,Cluster PF,Legacy Phasing/Prephasing Rate,Phasing  slope/offset,Prephasing slope/offset,Reads,Reads PF,%>=Q30,Yield,Cycles Error,Aligned,Error,Error (35),Error (75),Error (100),Intensity C1
1,-,38,537 +/- 10,95.79 +/- 0.52,0.000 / 0.000,nan / nan,nan / nan,13.59,13.02,68.50,0.12,0,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,456 +/- 40
1,1,19,537 +/- 11,96.18 +/- 0.34,nan / nan,nan / nan,nan / nan,6.84,6.58,68.42,0.06,-,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,495 +/- 9
1,2,19,537 +/- 10,95.41 +/- 0.35,nan / nan,nan / nan,nan / nan,6.75,6.44,68.58,0.06,-,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,417 +/- 3
Read 3 (I)
Lane,Surface,Tiles,Density,Cluster PF,Legacy Phasing/Prephasing Rate,Phasing  slope/offset,Prephasing slope/offset,Reads,Reads PF,%>=Q30,Yield,Cycles Error,Aligned,Error,Error (35),Error (75),Error (100),Intensity C1
1,-,38,537 +/- 10,95.79 +/- 0.52,0.000 / 0.000,nan / nan,nan / nan,13.59,13.02,95.73,0.12,0,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,359 +/- 23
1,1,19,537 +/- 11,96.18 +/- 0.34,nan / nan,nan / nan,nan / nan,6.84,6.58,95.55,0.06,-,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,381 +/- 7
1,2,19,537 +/- 10,95.41 +/- 0.35,nan / nan,nan / nan,nan / nan,6.75,6.44,95.91,0.06,-,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,nan +/- nan,336 +/- 4
Read 4
Lane,Surface,Tiles,Density,Cluster PF,Legacy Phasing/Prephasing Rate,Phasing  slope/offset,Prephasing slope/offset,Reads,Reads PF,%>=Q30,Yield,Cycles Error,Aligned,Error,Error (35),Error (75),Error (100),Intensity C1
1,-,38,537 +/- 10,95.79 +/- 0.52,0.116 / 0.031,nan / nan,nan / nan,13.59,13.02,83.46,3.91,300,28.12 +/- 1.59,2.14 +/- 0.14,0.98 +/- 0.23,0.67 +/- 0.11,0.61 +/- 0.08,171 +/- 13
1,1,19,537 +/- 11,96.18 +/- 0.34,0.117 / 0.029,nan / nan,nan / nan,6.84,6.58,83.30,1.97,-,28.88 +/- 1.92,2.13 +/- 0.15,0.97 +/- 0.33,0.65 +/- 0.15,0.60 +/- 0.11,184 +/- 4
1,2,19,537 +/- 10,95.41 +/- 0.35,0.114 / 0.033,nan / nan,nan / nan,6.75,6.44,83.62,1.93,-,27.37 +/- 0.55,2.15 +/- 0.12,0.99 +/- 0.07,0.68 +/- 0.03,0.62 +/- 0.03,158 +/- 2
Extracted: 622
Called: 622
Scored: 622

>Solution :

This code will read the file line by line, and extract all individual CSV files out of it and add them to a list of CSV files.

from io import StringIO

with open('file.txt') as f:
    lines = [l.strip() for l in f.readlines()]

csvs = []
csv = []
for line in lines:
    if ',' not in line:
        csvs.append(csv)
        csv = []
    else:
        csv.append(line)

csvs = ['\n'.join(csv) for csv in csvs if len(csv) > 0]

dfs = [pd.read_csv(StringIO(csv)) for csv in csvs]

Test:

>>> len(dfs)
5

>>> dfs[0]
        Level  Yield  Projected Yield  Aligned  Error Rate  Intensity C1  %>=Q30
0      Read 1   1.31             1.31    23.92        1.40           170   91.09
1  Read 2 (I)   0.12             0.12     0.00         NaN           396   61.56
2  Read 3 (I)   0.12             0.12     0.00         NaN           319   91.23
3      Read 4   2.01             2.01    26.34        2.08           145   88.46
4       Total   8.00             8.00    24.12        1.65           274   81.46

>>> dfs[-1]
   Lane Surface  Tiles     Density      Cluster PF Legacy Phasing/Prephasing Rate Phasing  slope/offset  ... Cycles Error         Aligned          Error     Error (35)     Error (75)    Error (100) Intensity C1
0     1       -     38  537 +/- 10  95.79 +/- 0.52                  0.116 / 0.031             nan / nan  ...          300  28.12 +/- 1.59  2.14 +/- 0.14  0.98 +/- 0.23  0.67 +/- 0.11  0.61 +/- 0.08   171 +/- 13
1     1       1     19  537 +/- 11  96.18 +/- 0.34                  0.117 / 0.029             nan / nan  ...            -  28.88 +/- 1.92  2.13 +/- 0.15  0.97 +/- 0.33  0.65 +/- 0.15  0.60 +/- 0.11    184 +/- 4
2     1       2     19  537 +/- 10  95.41 +/- 0.35                  0.114 / 0.033             nan / nan  ...            -  27.37 +/- 0.55  2.15 +/- 0.12  0.99 +/- 0.07  0.68 +/- 0.03  0.62 +/- 0.03    158 +/- 2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading