Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create multiple-columns pandas dataframe from list

I can’t figure out how to create pandas dataframe (multiple-columns) from list. Some lines contains character ">" at the beggining. I want them to be column headers. Number of lines after each header is not the same.

My list:

>header
a
b
>header2
c
d
e
f
>header3
g
h
i

Dataframe I want to create:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>header1   >header2   >header3
a           c          g
b           d          h
            e          i
            f

>Solution :

Simply iterate through lines and match the headers with ‘>’. The challenge though is to create a df from a dictionary of lists with unequal size.

# The given list
lines = [">header", "a", "b", ">header2", "c", "d", "e", "f", ">header3", "g", "h", "i"]

# Iterate through the lines and create a sublist for each header
data = {}
column = ''
for line in lines:
    if line.startswith('>'):
        column = line
        data[column] = []
        continue
    data[column].append(line)

# Create the DataFrame
df = pd.DataFrame.from_dict(data,orient='index').T

output:

  >header >header2 >header3
0       a        c        g
1       b        d        h
2    None        e        i
3    None        f     None
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading