I have a pandas dataframe with a column containing arrays. I want to filter my df based on values of the column with arrays. For example, for the df
subject position current_choice inpt
837 ash [0.0, 0.005792712956593603, 0.0207826510381976... -1.0 [-0.0625, 1.0, 1.0]
838 zad [0.0, -0.00044640180445325853, -0.000892803608... 1.0 [1.0, -1.0, -1.0]
839 pop [0.0, 5.2698260765019904e-05, 0.00010539652153... 1.0 [0.0625, 1.0, 1.0]
840 syc [0.0, 0.0031267642423531117, 0.014658282457501... 1.0 [1.0, 1.0, 1.0]
841 ash [0.0, -0.00013353844781401902, -0.000267076895... -1.0 [-0.125, 1.0, 1.0]
I want to select df[df['inpt']==[0.0625, 1.0, 1.0]]. But I get the error
ValueError: ('Lengths must match to compare', (95994,), (3,))
Is there a way around it?
>Solution :
Filtering the DataFrame based on the values in the ‘inpt’ column, you can use the apply method along with a lambda function to compare each element in the ‘inpt’ column individually with the target array. Hope this helps.
import pandas as pd
# Your DataFrame
data = {'subject': ['ash', 'zad', 'pop', 'syc', 'ash'],
'position': [[0.0, 0.005792712956593603, 0.0207826510381976], [0.0, -0.00044640180445325853, -0.000892803608], [0.0, 5.2698260765019904e-05, 0.00010539652153], [0.0, 0.0031267642423531117, 0.0146582824575], [0.0, -0.00013353844781401902, -0.000267076895]],
'current_choice': [-1.0, 1.0, 1.0, 1.0, -1.0],
'inpt': [[-0.0625, 1.0, 1.0], [1.0, -1.0, -1.0], [0.0625, 1.0, 1.0], [1.0, 1.0, 1.0], [-0.125, 1.0, 1.0]]}
df = pd.DataFrame(data)
target_array = [0.0625, 1.0, 1.0]
filtered_df = df[df['inpt'].apply(lambda x: x == target_array)]
print(filtered_df)
resulted Output:
If the inpt is of numpy array, change the target to numpy array and try comparing
target_array = np.array([0.0625, 1.0, 1.0]) # Convert target_array to a NumPy array
df['inpt'] = df['inpt'].apply(lambda x: np.array(x))
filtered_df = df[df['inpt'].apply(lambda x: np.array_equal(x, target_array))]
