Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

created a new column in a pandas dataframe based on multiple different conditions

I found a similar question here, but it did not help me because my case is different.

I have a huge dataframe that looks more or less like the example below:

        x   y   st  mt  ast sr  c7  z   w
    0   mt      2   1   4   2   2   a   yes
    1       b   3   3   3   3   3       yes
    2           1   1   2   4   3       yes
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

my goal is to create another column named "other_flagged" based on the following condition:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

‘if columns x, y and z are NOT filled with a string, but w has a "yes", fill column "other_flagged" with the string ‘flagged’. So, my desired output would be:

        x   y   st  mt  ast sr  c7  z   w   other flagged
    0   mt      2   1   4   2   2   a   yes     
    1       b   3   3   3   3   3       yes    
    2           1   1   2   4   3       yes     flagged
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

in this case, only row 2 receives the string "flagged" because x, y and z are empty but column w has the string ‘yes’. I tried the code below but the new column created is being completely populated with the string ‘flagged’ but not based on the conditions:

 survey.loc[(survey['x'] != '') & (survey['y'] !='') & (survey['z'] != '') & 
(survey['w'] == 'yes'), 'other_flagged'] ='flagged'

also tried:

survey.loc[(survey['x'] != '[a-zA-Z0-9-]+') & (survey['y'] != '[a-zA-Z0-9-]+') & 
(survey['z'] != '[a-zA-Z0-9-]+') & (survey['w'] == 'yes'), 'other_flagged']='flagged'

How can I achieve this goal?

>Solution :

Use if not exist values are empty strings is possible test all columns together use DataFrame.all:

m = (survey[['x', 'y', 'z']] == '').all(axis=1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x  y  st  mt  ast  sr  c7  z    w other_flagged
0    m  t   2   1    4   2   2  a  yes           NaN
1       b   3   3    3   3   3     yes           NaN
2           1   1    2   4   3     yes       flagged
3       d   3   3    1   2   4  d                NaN
4       e   2   3    2   1   4                   NaN
5  nlp      5   5    5   5   5  f  yes           NaN

Your solution – change != to ==:

m = (survey['x'] == '') & (survey['y'] =='') & (survey['z'] == '') & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

If values are missing values:

m = survey[['x', 'y', 'z']].isna().all(1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x    y  st  mt  ast  sr  c7    z    w other_flagged
0    m    t   2   1    4   2   2    a  yes           NaN
1  NaN    b   3   3    3   3   3  NaN  yes           NaN
2  NaN  NaN   1   1    2   4   3  NaN  yes       flagged
3  NaN    d   3   3    1   2   4    d  NaN           NaN
4  NaN    e   2   3    2   1   4  NaN  NaN           NaN
5  nlp  NaN   5   5    5   5   5    f  yes           NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading