I have a dataframe that contains images:
SOME_COL SOME_COL IMAGE_MAIN IMAGE_2 IMAGE_3 IMAGE_4 IMAGE_5 IMAGE_6 * * 0 1 2 3 NaN 5
I want to drop the
IMAGE_[2..6] columns and create a new one
SOME_COL SOME_COL IMAGES * * [0,1,2,3,5]
If any image is
NaN I would like to skip that value instead of adding
NaN to the list.
I tried this but it’s obviously not a good way to do that:
main_image = data_main['IMAGE_MAIN'] image_2 = data_main['IMAGE_2'] image_3 = data_main['IMAGE_3'] image_4 = data_main['IMAGE_4'] image_5 = data_main['IMAGE_5'] image_6 = data_main['IMAGE_6'] images = [x for x in [IMAGE_MAIN, IMAGE_2, IMAGE_3, IMAGE_4, IMAGE_5, IMAGE_6] if x] data_main['IMAGES'] = images
You can start by filtering the columns which start with ‘IMAGE’ using
DataFrame.filter, and then apply a function row-wise using
DataFrame.apply which drops the NaN of each row and transforms it into a single list
df['IMAGES'] = ( df.filter(like='IMAGE') .apply(lambda row: row.dropna().tolist(), axis=1) )
Note that if a row contains NaNs the resulting list will contain floats, not integers. If you want to make sure that the values are integers use
lambda row: row.dropna().astype(int).tolist().