I am trying to extract values between : and - from this below
>>> all_cancers.iloc[:,3]
0 chr1:100414771-100414772
1 chr1:10506157-10506158
2 chr1:109655506-109655507
3 chr1:113903257-113903258
4 chr1:117598869-117598870
I tried re.findall('\:(.*?)\-', all_cancers.iloc[:,3].astype(str)) to do this but it generates the following error: TypeError: expected string or bytes-like object.
What is missing here?
>Solution :
You can use this pattern,
In [33]: re.match(r'.*:(.*)-',"chr1:100414771-100414772").group(1)
Out[33]: '100414771'
In datafame you can do with apply + lambda
all_cancers.iloc[:,3].apply(lambda x: re.match(r'.*:(.*)-', x).group(1))
Using extract
all_cancers.iloc[:,3].str.extract(r'.*:(.*)-')
(credit: OlvinRoght’s comment)