Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Appending iterated rows to new dataframe

I’m new to python, and I could use a push in the right direction for what I think should be (?) a pretty simple problem. I’ve got a dataframe (genres_df) with one column:

              0
0        Horror
1        Comedy
2       Fantasy
3     Adventure
4         Drama
5     Animation
6         Crime
...

and a dataframe (df) with 3 columns–one for each genre associated with the film, and one row for each film I’m looking at):

    0   1   2
0   Horror  Short   None
1   Horror  Short   None
2   Comedy  Horror  Short
3   Comedy  Horror  Short
4   Fantasy Horror  Short
...

I want to count the number of rows in the dataframe genres that contain each item in genres_df. I was able to do this by hand, with a sum line:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sum(df[0] == 'Comedy') + sum(df[1] == 'Comedy') + sum(df[2] == 'Comedy')

I know this works, because there is a Horror item in each row, and there are 78471 rows in df.

I want to get a dataframe that has two columns: the genre (from genres_df) and the count of rows in which that genre appears, across any of the columns in df. Like so

  0      1
0 Horror 78471
1 Comedy 9903
...

Here’s what I’ve got so far:

df_counts = pd.DataFrame(columns = ['genre','count'])
for i in genres_df[0]:
    s_row = pd.Series(i,sum(df[0]==i)+sum(df[1]==i)+sum(df[2]==i))
    df_counts.append(s_row,ignore_index=True)

But this doesn’t work. It seems to be the closest I’ve gotten, though. Help?

>Solution :

I think what you need to do is the following

  • melt your dataframe (i.e turn the column into rows)
  • apply a groupby operation & count

Finally.

  • filter the genre column with the genre’s from your first df

df1 =  pd.melt(df,value_name='genre').groupby('genre',
                            as_index=False).agg(count=('genre','count'))  


print(df1)

     genre  count
0   Comedy      2
1  Fantasy      1
2   Horror      5
3     None      2
4    Short      5

final = df1[df1['genre'].isin(all_genre['0'])]

     genre  count
0   Comedy      2
1  Fantasy      1
2   Horror      5

As a side note – if you’re using methods outside of the pandas API then you’re most likely doing something incorrectly (as in against the design of the library)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading