I have a .csv file with 29 columns and 1692 rows.
The columns D_INT_1 and D_INT_2 are just dates.
I want to check for these 2 columns if there is dates between :>= "2022-03-01" and <= "2024-12-31.
And, if a value is found, I want to display the date found + the value of the column "NAME" that is located on the same row of said found value.
This is what I did right now, but it only grab the dates and not the adjacent value ('NAME').
# importing module
import pandas as pd
# reading CV file
df = pd.read_csv("scratch_2.csv")
# converting column data to list
D_INT_1 = df['D_INT_1'].tolist()
D_INT_2 = df['D_INT_2'].tolist()
ext = []
ext = [i for i in D_INT_1 + D_INT_2 if i >= "2022-03-01" and i <= "2024-12-31"]
print(*ext, sep="\n")
This is what I would like to get:
Example of DF:
NAME, ADDRESS, D_INT_1, D_INT_2
Mark, H4N1V8, 2023-01-02, 2019,-01-01
Expected output:
MARK, 2023-01-02
>Solution :
First for performance dont use loops, because exist vectorized alternatives unpivot by DataFrame.melt and filter by Series.between with DataFrame.loc:
df = df.melt(id_vars='NAME', value_vars=['D_INT_1','D_INT_2'], value_name='Date')
df1 = df.loc[df['Date'].between("2022-03-01","2024-12-31"), ['NAME','Date']]
print (df1)
NAME Date
0 Mark 2023-01-02
Or filter original DataFrame and last join in concat:
df1 = df.loc[df['D_INT_1'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_1']]
df2 = df.loc[df['D_INT_2'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_2']]
df = pd.concat([df1.rename(columns={'D_INT_1':'date'}),
df2.rename(columns={'D_INT_2':'date'})])
print (df)
NAME date
0 Mark 2023-01-02
Last if need loops output with print:
for i in df.itertuples():
print (i.NAME, i.Date)
Mark 2023-01-02 00:00:00
Mark 2019-01-01 00:00:00