I have a dataframe like e.g.:
id some_string
1. blah,count=1,blah
2. blah,blah
3 blah,count=4,blah
4. blah,blah
5 blah,count=4,blah
6. blah,count=3,blah
I would like to use split to set a separate column with the value of count to get:
id some_string count
1 blah,count=1,blah 1
2 blah,blah 0
3 blah,count=4,blah 4
4 blah,blah 0
5 blah,count=4,blah 4
6 blah,count=3,blah 3
I tried:
df['count'].str.split('[count=|,]',expand=True)[3]
but it rightly complains that:
Length of values (4) does not match length of index (6)
Is there an obvious way of doing this short of looping through the dataframe entries?
>Solution :
Don’t split
, use extract
:
df['count'] = (df['some_string'].str.extract(r'count=(\d+)', expand=False)
.fillna(0).astype(int)
)
Output:
id some_string count
0 1.0 blah,count=1,blah 1
1 2.0 blah,blah 0
2 3.0 blah,count=4,blah 4
3 4.0 blah,blah 0
4 5.0 blah,count=4,blah 4
5 6.0 blah,count=3,blah 3
Regex:
count= # match literal "count="
( # start capturing group
\d+ # capture one or more (+) digits (\d)
) # end capturing group