Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to replace values by users correctly based on reference value?

I have a dataframe where for each user I have always 2 pairs by goods_type and business KPI for EACH USER such as:

user | amount   | goods_type | business
1    | $10      | used       | cost
1    | $30      | used       | revenue
1    | $15      | new        | cost
1    | $50      | new        | revenue

Now, I want to add a segment column using np.where in pandas but my logic applies only to rows where business = revenue. As far as the cost rows go, the segment value should be the same as the revenue for the SAME USER and SAME GOODS_TYPE pair, not based on the np.where clause. The segment value for all "revenue" rows is determined like this:

np.where( (df.amount < 15) & (df.goods_type =='used') , 'low', 'high' )

How would I do this so it looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

user | amount   | goods_type | business | segment
1    | $10      | used       | cost     | low (this should be based on the segment value of the associated revenue row below!)
1    | $30      | used       | revenue. | low (this comes from the logic)
1    | $15      | new        | cost     | high (this should be based on the segment value of the associated revenue row below!)
1    | $50      | new        | revenue  | high (this comes from the logic)

>Solution :

First, replace the dollar sign with blank. Use np.where to replace your values as you did before. Then set the segment values in business that are in the same row as cost to NaN values and fill them with the next value, using bfill.

df.amount=df.amount.str.replace('$','').astype(int)
df['segment'] = np.where(((df.amount <= 30) & (df.goods_type =='used')), 'low', 'high')
df.loc[df['business']=='cost', 'segment'] = np.nan
df.fillna(method='bfill', inplace=True)

Output:

    user    amount  goods_type  business    segment
0   1         10    used    cost    low
1   1         30    used    revenue low
2   1         15    new cost    high
3   1         50    new revenue high
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading