I want to extract all the folder names between "name=’" and "/’" under a list of strings using regex
string = "FileInfo(path='dbfs:/mnt/34334324/folder1/', name='folder1/', size=0), FileInfo(path='dbfs:/mnt/34334324/folder2/', name='folder2/', size=0),"
expected result = [folder1, folder2]
This is running in databricks so things like
from glob import glob
glob("/mnt/targetfiles/*/", recursive = True)
is not working
>Solution :
Is there always a "/" in the name? Yes
What happens if there are several? there are not several
Can the names contain escaped characters? no
Thus a simple regex would work:
import re
re.findall("(?<= name=').*?(?=/')", string)
output: ['folder1', 'folder2']
How it works:
(?<=name=') # must be preceded by " name='"
.*? # get the shortest string
(?=/') # must be followed by "/'"