Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to generate label by sparse cumcount

Here’s my master dataset

Id     Data          Category      Code
1      tey           Airport       AIR_02
2      fg            Hospital      HEA_04
3      dffs          Airport       AIR_01
4      dsfs          Hospital      HEA_03      
5      fdsf          Airport       AIR_04

Here’s the data I wanr to merge

Id     Data             Category
1      tetyer           Airport
2      fgdss            Hospital
3      dffsdsa          Airport
4      dsfsas           Hospital      
5      fdsfada          Airport

My Expected Output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Id     Data          Category      Code
1      tey           Airport       AIR_02
2      fg            Hospital      HEA_04
3      dffs          Airport       AIR_01
4      dsfs          Hospital      HEA_03      
5      fdsf          Airport       AIR_04
6      tetyer        Airport       AIR_03
7      fgdss         Hospital      HEA_01
8      dffsdsa       Airport       AIR_05
9      dsfsas        Hospital      HEA_02
10      fdsfada      Airport       AIR_06

Note:
HEA_01is not avaliable on existing dataset, Every Hospital Code start with HEA_ and Every airport start with AIR_, code 01,02 etc is by avaliability

>Solution :

Use:

#split Code by _
df1[['a','b']] = df1['Code'].str.split('_', expand=True)
#converting values to integers
df1['b'] = df1['b'].astype(int)
#aggregate for list and first value for mapping
df11 = df1.groupby(['Category']).agg({'a':'first', 'b':list})

#get difference by np.arange with used values
def f(x):
    L = df11['b'][x.name]
    a = np.arange(1, len(x) + len(L) + 1)
    #difference with filter same number of values like length of group
    return np.setdiff1d(a, L)[:len(x)]


df2['Code'] = df2.groupby('Category')['Category'].transform(f)
#created Code with join
df2['Code'] = df2['Category'].map(df11['a']) + '_' + df2['Code'].astype(str).str.zfill(2)
print (df2)
   Id     Data  Category    Code
0   1   tetyer   Airport  AIR_03
1   2    fgdss  Hospital  HEA_01
2   3  dffsdsa   Airport  AIR_05
3   4   dsfsas  Hospital  HEA_02
4   5  fdsfada   Airport  AIR_06

df = pd.concat([df1.drop(['a','b'], 1), df2], ignore_index=True)
print (df)
   Id     Data  Category    Code
0   1      tey   Airport  AIR_02
1   2       fg  Hospital  HEA_04
2   3     dffs   Airport  AIR_01
3   4     dsfs  Hospital  HEA_03
4   5     fdsf   Airport  AIR_04
5   1   tetyer   Airport  AIR_03
6   2    fgdss  Hospital  HEA_01
7   3  dffsdsa   Airport  AIR_05
8   4   dsfsas  Hospital  HEA_02
9   5  fdsfada   Airport  AIR_06
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading