Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas find and replace based on column items count

I have a dataframe that looks like this

import pandas as pd

all_data_set = [
        ('A','Area1','AA','A B D E','A B','D E'),
        ('B','Area1','AA','A B D E','A B','D E'),
        ('C','Area2','BB','C','C','C'),
        ('E','Area1','CC','A B D E','A B','D E'),
        ('F','Area3','BB','F G','G','F')
        ]

all_df = pd.DataFrame(data = all_data_set, columns = ['Name','Area','Type','Group','AA members','CC members'])

 Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB      F G          G          F

The last row (row 4) is in correct.
Anything that is type BB should only have itself (F) in Group AA members CC members

So it should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

4    F  Area3   BB        F          F          F

Todo this I was trying to:

  1. check when Type is BB and Length of Group is = 2 items like this:

    df = (all_data_set.loc[(all_data_set['Type']== 'BB')]['Group'].str.split().str.len() == 2)

  2. Then Iterate over every row and to find the cases like this

  3. make a new Df with all the drop rows and make the Group , AA members, CC members = Name

  4. Drop the row where that happens in all_df

  5. Merge 3. back in to all_df

Is there a better pandas way to do this?

>Solution :

Try

# identify rows where Type is BB
m = all_df['Type'] == 'BB'
# for Type BB rows, replace Group, AA members and CC members values by Name
all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']
print(all_df)
  Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB        F          F          F
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading