Appending iterated rows to new dataframe

February 17, 2022

I’m new to python, and I could use a push in the right direction for what I think should be (?) a pretty simple problem. I’ve got a dataframe (genres_df) with one column:

              0
0        Horror
1        Comedy
2       Fantasy
3     Adventure
4         Drama
5     Animation
6         Crime
...

and a dataframe (df) with 3 columns–one for each genre associated with the film, and one row for each film I’m looking at):

    0   1   2
0   Horror  Short   None
1   Horror  Short   None
2   Comedy  Horror  Short
3   Comedy  Horror  Short
4   Fantasy Horror  Short
...

I want to count the number of rows in the dataframe genres that contain each item in genres_df. I was able to do this by hand, with a sum line:

sum(df[0] == 'Comedy') + sum(df[1] == 'Comedy') + sum(df[2] == 'Comedy')

I know this works, because there is a Horror item in each row, and there are 78471 rows in df.

I want to get a dataframe that has two columns: the genre (from genres_df) and the count of rows in which that genre appears, across any of the columns in df. Like so

  0      1
0 Horror 78471
1 Comedy 9903
...

Here’s what I’ve got so far:

df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
    s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
    df_counts.append(s_row,ignore_index=True)

But this doesn’t work. It seems to be the closest I’ve gotten, though. Help?

>Solution :

I think what you need to do is the following

melt your dataframe (i.e turn the column into rows)
apply a groupby operation & count

Finally.

filter the genre column with the genre’s from your first df

df1 =  pd.melt(df,value_name='genre').groupby('genre',
                            as_index=False).agg(count=('genre','count'))  


print(df1)

     genre  count
0   Comedy      2
1  Fantasy      1
2   Horror      5
3     None      2
4    Short      5

final = df1[df1['genre'].isin(all_genre['0'])]

     genre  count
0   Comedy      2
1  Fantasy      1
2   Horror      5

As a side note – if you’re using methods outside of the pandas API then you’re most likely doing something incorrectly (as in against the design of the library)