Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

find the unqiue substring pattern in a list of string with python

I have a list of strings as below:

['/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
 '/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt',

I hope to extract the unique subject id with the pattern ‘sub-???_S_????’ in the list.

So far I can do it with:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

unique_subject = re.search('(.*)_sub-(.*)-ses(.*).txt', all_files[0]).group(2)

But that only works for a single string. I need to do it with a loop.

unique_subject = set()

for f in all_files:
    unique_subject.add(re.search('(.*)_sub-(.*)-ses(.*).txt', f).group(2))

I am wondering if there are better ways to do this. Finally I would like to get the first session for each subject. Is there a fast way to do that?

>Solution :

Try using this:

l = re.findall('\d{3}_S_\d{4}', ''.join(all_files))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading