Duplicate row based on specific conditions in column

As input I have a table like this example:

ID_EXAMPLE	KRED_NUMMER	SYSTEM	BUKRS
1	100506	PPL	0002 (PG 1030);#1025;#1340
2	1656082	PASM	3333
3	1656099	DFMG	0716;#6008;#6104

And as you can see last column BUKRS can have different value formats. And I’d like for each code (4 digits) in BUKRS to create a separate row, but with the same values in other columns. Based on 2 conditions:

duplicate row for first 4 digits in BUKRS (if there is a string in the brackets, ignore it)
duplicate row for each 4 digits in BUKRS after #

The expected result:

ID_EXAMPLE	KRED_NUMMER	SYSTEM	BUKRS	My Comment
1	100506	PPL	0002	always take first 4 digits in BUKRS
1	100506	PPL	1025	4 digits after each # (after first #)
1	100506	PPL	1340	4 digits after each # (after second #)
2	1656082	PASM	3333	always take first 4 digits in field
3	1656099	DFMG	0716	always take first 4 digits in field
3	1656099	DFMG	6008	4 digits after each # (after first #)
3	1656099	DFMG	6104	4 digits after each # (after second #)

>Solution :

You can use str.findall and a short regex to get the numbers, then explode:

out = (df
 .assign(BUKRS=df['BUKRS'].str.findall(r'(?:^|#)(\d{4})'))
 .explode('BUKRS')
)

output:

   ID_EXAMPLE  KRED_NUMMER SYSTEM BUKRS
0           1       100506    PPL  0002
0           1       100506    PPL  1025
0           1       100506    PPL  1340
1           2      1656082   PASM  3333
2           3      1656099   DFMG  0716
2           3      1656099   DFMG  6008
2           3      1656099   DFMG  6104

Related

Leave a ReplyCancel reply