Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Prefixing each item in a Pandas DataFrame with an ascending number if it has a value

I’m currently trying to find the best solution to this problem:

I import a csv into Python, some of the cells have values and some don’t. I want to loop through the dataframe and if a cell has a value, append a number to it which is incremented each time a new cell has text. These appended values are then saved to a new column

I also save each piece of text to another list as I need to use them in the next step (out). The following code produces the correct output, I just wondered if there is a more efficient way to do it?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I considered np.where() or something similar but can’t increment a count from that.

import pandas as pd

df = pd.DataFrame(
    [["", "c", "e"], ["a", "", ""], ["b", "d", "f"]], 
    columns = ["x", "y", "z"]
)
count = 1
out = []

# Loop over each column
for col in df.columns:
      tmp = []
      # Loop over each entry, if cell has text, add prefix
      for text in df[col]:
        if text:
          tmp.append(str(count) + "-" + text)
          out.append(text)
          count += 1
        else:
          tmp.append("")
      # Add prefixed column to df    
      df["pre-" + col] = tmp

Giving output:
Output DF

This doesn’t necessarily have to be done column-wise, as long as the numbers correspond to the order in out.

>Solution :

If you don’t care about the order of the counter, reshape with stack, increment with cumsum, then unstack:

s = df.replace('', np.nan).stack()

out = (s.notna().cumsum().astype(str)+'-'+s
      ).unstack().fillna('')

Output:

     x    y    z
0       1-c  2-e
1  3-a          
2  4-b  5-d  6-f

If order matters:

s = df.replace('', np.nan).T.stack()

out = (s.notna().cumsum().astype(str)+'-'+s
      ).unstack().T.fillna('')

Or using :

m = df.ne('').to_numpy()

tmp = (pd.DataFrame(m.T.cumsum(axis=None).reshape(m.shape).T,
                    index=df.index, columns=df.columns)
         .astype(str).add('-')
         .where(m, '')
      )

out = tmp+df

Output:

     x    y    z
0       3-c  5-e
1  1-a          
2  2-b  4-d  6-f

And to combine the output to the original DataFrame, join:

df.join(out.add_prefix('pre-'))

Output:

   x  y  z pre-x pre-y pre-z
0     c  e         3-c   5-e
1  a         1-a            
2  b  d  f   2-b   4-d   6-f
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading