Home Prefixing each item in a Pandas DataFrame with an ascending number if it has a value

Questions

Prefixing each item in a Pandas DataFrame with an ascending number if it has a value

November 17, 2023

I’m currently trying to find the best solution to this problem:

I import a csv into Python, some of the cells have values and some don’t. I want to loop through the dataframe and if a cell has a value, append a number to it which is incremented each time a new cell has text. These appended values are then saved to a new column

I also save each piece of text to another list as I need to use them in the next step (out). The following code produces the correct output, I just wondered if there is a more efficient way to do it?

I considered np.where() or something similar but can’t increment a count from that.

import pandas as pd

df = pd.DataFrame(
    [["", "c", "e"], ["a", "", ""], ["b", "d", "f"]], 
    columns = ["x", "y", "z"]
)
count = 1
out = []

# Loop over each column
for col in df.columns:
      tmp = []
      # Loop over each entry, if cell has text, add prefix
      for text in df[col]:
        if text:
          tmp.append(str(count) + "-" + text)
          out.append(text)
          count += 1
        else:
          tmp.append("")
      # Add prefixed column to df    
      df["pre-" + col] = tmp

Giving output:
Output DF

This doesn’t necessarily have to be done column-wise, as long as the numbers correspond to the order in out.

>Solution :

If you don’t care about the order of the counter, reshape with stack, increment with cumsum, then unstack:

s = df.replace('', np.nan).stack()

out = (s.notna().cumsum().astype(str)+'-'+s
      ).unstack().fillna('')

Output:

     x    y    z
0       1-c  2-e
1  3-a          
2  4-b  5-d  6-f

If order matters:

s = df.replace('', np.nan).T.stack()

out = (s.notna().cumsum().astype(str)+'-'+s
      ).unstack().T.fillna('')

Or using numpy:

m = df.ne('').to_numpy()

tmp = (pd.DataFrame(m.T.cumsum(axis=None).reshape(m.shape).T,
                    index=df.index, columns=df.columns)
         .astype(str).add('-')
         .where(m, '')
      )

out = tmp+df

Output:

     x    y    z
0       3-c  5-e
1  1-a          
2  2-b  4-d  6-f

And to combine the output to the original DataFrame, join:

df.join(out.add_prefix('pre-'))

Output:

   x  y  z pre-x pre-y pre-z
0     c  e         3-c   5-e
1  a         1-a            
2  b  d  f   2-b   4-d   6-f

dataframe

byMR

Published November 17, 2023

Add a comment

Practical advantages of std::copy, std::equal, std::fill over memcpy (and strcpy), memcmp (and strcmp), memset

byMR

November 17, 2023

Questions

how to solve unresolve attribute reference objects in pyCharm

byMR

November 17, 2023

Questions

Why is a class not assignable?

byMR

November 17, 2023

Questions

Download Large File Efficiently

byMR

November 17, 2023

Questions

Python generator missing a value during iteration

byMR

November 17, 2023

Questions

The "error TS7030: Not all code paths return a value." caused by "strict": false in tsconfig.json

byMR

November 17, 2023

Prefixing each item in a Pandas DataFrame with an ascending number if it has a value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Practical advantages of std::copy, std::equal, std::fill over memcpy (and strcpy), memcmp (and strcmp), memset

how to solve unresolve attribute reference objects in pyCharm

Why is a class not assignable?

Download Large File Efficiently

Python generator missing a value during iteration

The "error TS7030: Not all code paths return a value." caused by "strict": false in tsconfig.json

Keep Up to Date with the Most Important News

Prefixing each item in a Pandas DataFrame with an ascending number if it has a value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Practical advantages of std::copy, std::equal, std::fill over memcpy (and strcpy), memcmp (and strcmp), memset

how to solve unresolve attribute reference objects in pyCharm

Why is a class not assignable?

Download Large File Efficiently

Python generator missing a value during iteration

The "error TS7030: Not all code paths return a value." caused by "strict": false in tsconfig.json

Discover more from Dev solutions