Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract several values from a row when a certain value is found using a list

I have a .csv file with 29 columns and 1692 rows.
The columns D_INT_1 and D_INT_2 are just dates.
I want to check for these 2 columns if there is dates between :>= "2022-03-01" and <= "2024-12-31.
And, if a value is found, I want to display the date found + the value of the column "NAME" that is located on the same row of said found value.

This is what I did right now, but it only grab the dates and not the adjacent value ('NAME').

# importing module
import pandas as pd
# reading CV file
df = pd.read_csv("scratch_2.csv")

# converting column data to list
D_INT_1 = df['D_INT_1'].tolist()
D_INT_2 = df['D_INT_2'].tolist()



ext = []

ext = [i for i in D_INT_1 + D_INT_2 if i >= "2022-03-01" and i <= "2024-12-31"]

print(*ext, sep="\n")

This is what I would like to get:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Example of DF:

NAME, ADDRESS, D_INT_1, D_INT_2
Mark, H4N1V8, 2023-01-02, 2019,-01-01

Expected output:

MARK, 2023-01-02

>Solution :

First for performance dont use loops, because exist vectorized alternatives unpivot by DataFrame.melt and filter by Series.between with DataFrame.loc:

df = df.melt(id_vars='NAME', value_vars=['D_INT_1','D_INT_2'], value_name='Date')

df1 = df.loc[df['Date'].between("2022-03-01","2024-12-31"), ['NAME','Date']]

print (df1)
   NAME       Date
0  Mark 2023-01-02

Or filter original DataFrame and last join in concat:

df1 = df.loc[df['D_INT_1'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_1']]
df2 = df.loc[df['D_INT_2'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_2']]

df = pd.concat([df1.rename(columns={'D_INT_1':'date'}), 
                df2.rename(columns={'D_INT_2':'date'})])

print (df)
   NAME       date
0  Mark 2023-01-02

Last if need loops output with print:

for i in df.itertuples():
    print (i.NAME, i.Date)
    
Mark 2023-01-02 00:00:00
Mark 2019-01-01 00:00:00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading