Here’s my master dataset
Id Data Category Code
1 tey Airport AIR_02
2 fg Hospital HEA_04
3 dffs Airport AIR_01
4 dsfs Hospital HEA_03
5 fdsf Airport AIR_04
Here’s the data I wanr to merge
Id Data Category
1 tetyer Airport
2 fgdss Hospital
3 dffsdsa Airport
4 dsfsas Hospital
5 fdsfada Airport
My Expected Output
Id Data Category Code
1 tey Airport AIR_02
2 fg Hospital HEA_04
3 dffs Airport AIR_01
4 dsfs Hospital HEA_03
5 fdsf Airport AIR_04
6 tetyer Airport AIR_03
7 fgdss Hospital HEA_01
8 dffsdsa Airport AIR_05
9 dsfsas Hospital HEA_02
10 fdsfada Airport AIR_06
Note:
HEA_01is not avaliable on existing dataset, Every Hospital Code start with HEA_ and Every airport start with AIR_, code 01,02 etc is by avaliability
>Solution :
Use:
#split Code by _
df1[['a','b']] = df1['Code'].str.split('_', expand=True)
#converting values to integers
df1['b'] = df1['b'].astype(int)
#aggregate for list and first value for mapping
df11 = df1.groupby(['Category']).agg({'a':'first', 'b':list})
#get difference by np.arange with used values
def f(x):
L = df11['b'][x.name]
a = np.arange(1, len(x) + len(L) + 1)
#difference with filter same number of values like length of group
return np.setdiff1d(a, L)[:len(x)]
df2['Code'] = df2.groupby('Category')['Category'].transform(f)
#created Code with join
df2['Code'] = df2['Category'].map(df11['a']) + '_' + df2['Code'].astype(str).str.zfill(2)
print (df2)
Id Data Category Code
0 1 tetyer Airport AIR_03
1 2 fgdss Hospital HEA_01
2 3 dffsdsa Airport AIR_05
3 4 dsfsas Hospital HEA_02
4 5 fdsfada Airport AIR_06
df = pd.concat([df1.drop(['a','b'], 1), df2], ignore_index=True)
print (df)
Id Data Category Code
0 1 tey Airport AIR_02
1 2 fg Hospital HEA_04
2 3 dffs Airport AIR_01
3 4 dsfs Hospital HEA_03
4 5 fdsf Airport AIR_04
5 1 tetyer Airport AIR_03
6 2 fgdss Hospital HEA_01
7 3 dffsdsa Airport AIR_05
8 4 dsfsas Hospital HEA_02
9 5 fdsfada Airport AIR_06