Duplicate row based on specific conditions in column

As input I have a table like this example:

ID_EXAMPLE KRED_NUMMER SYSTEM BUKRS
1 100506 PPL 0002 (PG 1030);#1025;#1340
2 1656082 PASM 3333
3 1656099 DFMG 0716;#6008;#6104

And as you can see last column BUKRS can have different value formats. And I’d like for each code (4 digits) in BUKRS to create a separate row, but with the same values in other columns. Based on 2 conditions:

  • duplicate row for first 4 digits in BUKRS (if there is a string in the brackets, ignore it)
  • duplicate row for each 4 digits in BUKRS after #

The expected result:

ID_EXAMPLE KRED_NUMMER SYSTEM BUKRS My Comment
1 100506 PPL 0002 always take first 4 digits in BUKRS
1 100506 PPL 1025 4 digits after each # (after first #)
1 100506 PPL 1340 4 digits after each # (after second #)
2 1656082 PASM 3333 always take first 4 digits in field
3 1656099 DFMG 0716 always take first 4 digits in field
3 1656099 DFMG 6008 4 digits after each # (after first #)
3 1656099 DFMG 6104 4 digits after each # (after second #)

>Solution :

You can use str.findall and a short regex to get the numbers, then explode:

out = (df
 .assign(BUKRS=df['BUKRS'].str.findall(r'(?:^|#)(\d{4})'))
 .explode('BUKRS')
)

output:

   ID_EXAMPLE  KRED_NUMMER SYSTEM BUKRS
0           1       100506    PPL  0002
0           1       100506    PPL  1025
0           1       100506    PPL  1340
1           2      1656082   PASM  3333
2           3      1656099   DFMG  0716
2           3      1656099   DFMG  6008
2           3      1656099   DFMG  6104

Leave a Reply