I have the following csv file:
Column1;Column2;Column3;Column4
A;B;6;6200
B;D;5;5000
E;F;6;5100
F;F;6;6200
Now I want to check, if column4 always begins with a ‘6‘, if column3 has an entry 6. In case, where this does not match I want to print a message. Same holds for the case vice versa: In case column4 does not begin with a ‘6‘, but column3 has an entry 6. Both columns are string.
I tried:
if ((df[df["Column3"] == "6"] and df['Column4'].str[0:1] <> "6") or (df[df["Column3"] <> "6"] and df['Column4'].str[0:1] == "6")):
print("Error")
But doesn’t work. What did I miss?
>Solution :
You can use two boolean masks and check their identity:
out = df[df['Column3'].eq('6') == df['Column4'].str.startswith('6')]
NB. assuming you have strings. If not, add `.astype(str) right after slicing.
out = df[df['Column3'].astype(str).eq('6') == df['Column4'].astype(str).str.startswith('6')]
Output:
0 A B 6 6200
1 B D 5 5000
3 F F 6 6200
printing a message:
m = df['Column3'].eq('6') != df['Column4'].str.startswith('6')
if m.any():
print(f'Error: row(s) {",".join(m[m].index.astype(str))} are invalid')
Output:
Error: row(s) 2 are invalid