Use str.split() to set value of column in dataframe, but only for some rows

I have a dataframe like e.g.:

id some_string  
1. blah,count=1,blah
2. blah,blah
3  blah,count=4,blah
4. blah,blah
5  blah,count=4,blah
6. blah,count=3,blah

I would like to use split to set a separate column with the value of count to get:

id some_string        count
1  blah,count=1,blah   1
2  blah,blah           0
3  blah,count=4,blah   4
4  blah,blah           0 
5  blah,count=4,blah   4
6  blah,count=3,blah   3

I tried:

df['count'].str.split('[count=|,]',expand=True)[3]

but it rightly complains that:

 Length of values (4) does not match length of index (6)

Is there an obvious way of doing this short of looping through the dataframe entries?

>Solution :

Don’t split, use extract:

df['count'] = (df['some_string'].str.extract(r'count=(\d+)', expand=False)
               .fillna(0).astype(int)
              )

Output:

    id        some_string  count
0  1.0  blah,count=1,blah      1
1  2.0          blah,blah      0
2  3.0  blah,count=4,blah      4
3  4.0          blah,blah      0
4  5.0  blah,count=4,blah      4
5  6.0  blah,count=3,blah      3

Regex:

count=    # match literal "count="
(         # start capturing group
\d+       # capture one or more (+) digits (\d)
)         # end capturing group

regex demo

Leave a Reply