Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python pandas regex find pattern from another row

I have a python pandas dataframe with the following pattern:

file_path
/home
/home/folder1
/home/folder1/file1.xlsx
/home/folder1/file2.xlsx
/home/folder2
/home/folder2/date
/home/folder2/date/dates.txt
/home/folder3

I would like to get the parent path in a new column, if there is no parent then call it "ROOT"

file_path parent_path
/home ROOT
/home/folder1 /home
/home/folder1/file1.xlsx /home/folder1
/home/folder1/file2.xlsx /home/folder1
/home/folder2 /home
/home/folder2/date /home/folder2
/home/folder2/date/dates.txt /home/folder2/date
/home/folder3 /home

My attempt:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import re
import pandas as pd

df = pd.DataFrame(["/home", "/home/folder1", "/home/folder1/file1.xlsx", 
"/home/folder1/file1.xlsx", "/home/folder1/file2.xlsx", "/home/folder2", 
"/home/folder2/date", "/home/folder2/date/dates.txt", "/home/folder3"], columns=["file_path"])

# Get list

file_paths = df.file_path.unique()

def match_parent(x, file_paths):
    x = x.split('/')
    levels = len(x)
    # Check that parent contains all elements of x and the length is 1 less





I was thinking to make a function that:

  1. For each row, compute its length and match those that are 1 length less than the current row AND,

  2. All previous items match (are exactly the same)

How can I do that?

>Solution :

Use pathlib.Path.parent to extract the parent, as follows:

import pandas as pd
import pathlib

df = pd.DataFrame(["/home", "/home/folder1", "/home/folder1/file1.xlsx",
                   "/home/folder1/file1.xlsx", "/home/folder1/file2.xlsx", "/home/folder2",
                   "/home/folder2/date", "/home/folder2/date/dates.txt", "/home/folder3"], columns=["file_path"])


df["parent"] = df["file_path"].apply(lambda x: pathlib.Path(x).parent)
print(df)

Output

                      file_path              parent
0                         /home                   /
1                 /home/folder1               /home
2      /home/folder1/file1.xlsx       /home/folder1
3      /home/folder1/file1.xlsx       /home/folder1
4      /home/folder1/file2.xlsx       /home/folder1
5                 /home/folder2               /home
6            /home/folder2/date       /home/folder2
7  /home/folder2/date/dates.txt  /home/folder2/date
8                 /home/folder3               /home

to match the exact output:

df["parent"] = df["file_path"].apply(lambda x: res if (res := pathlib.Path(x).parent) != pathlib.Path("/") else "ROOT")
print(df)

Output

                      file_path              parent
0                         /home                ROOT
1                 /home/folder1               /home
2      /home/folder1/file1.xlsx       /home/folder1
3      /home/folder1/file1.xlsx       /home/folder1
4      /home/folder1/file2.xlsx       /home/folder1
5                 /home/folder2               /home
6            /home/folder2/date       /home/folder2
7  /home/folder2/date/dates.txt  /home/folder2/date
8                 /home/folder3               /home
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading