Home Lengthening Pandas Dataframe by setting column headers as a row values and having a value column

Questions

Lengthening Pandas Dataframe by setting column headers as a row values and having a value column

August 20, 2022

I am a bit stuck with how to reshape my dataframe into a shape that offers me more flexibility.

My current dataframe is as follows.

Orginal_df = pd.DataFrame([['Action', 1, 5, 3], 
                   ['Comedy', 2, 4, 6],
                   ['Drama', 3, 2, 10], 
                   ['Crime', 1, 6, 6],
                   ['Documentary', 2, 9, 3]],
                  columns=['Genre', 'Bob', 'Sara', 'Peter'])
Movies.head()

The shape I want my dataframe to be in is as follows:

Wanted_df = pd.DataFrame([['Action', 'Bob', 1], 
                        ['Comedy', 'Bob', 2],
                        ['Drama', 'Bob', 3], 
                        ['Crime', 'Bob', 1],
                        ['Documentary', 'Bob', 2],
                        ['Action', 'Sara', 5], 
                        ['Comedy', 'Sara', 4],
                        ['Drama', 'Sara', 2], 
                        ['Crime', 'Sara', 6],
                        ['Documentary', 'Sara', 9],
                        ['Action', 'Peter', 3], 
                        ['Comedy', 'Peter', 6],
                        ['Drama', 'Peter', 10], 
                        ['Crime', 'Peter', 6],
                        ['Documentary', 'Peter', 3]],
                  columns=['Genre', 'Name', 'Count'])
Wanted_df.head()

Methods that I have tried are either concatenating with a loop.

df_movies_genre_frequency_test = df_movies_genre_frequency[['index']]
for user in users:
     df_movies_genre_frequency_test = pd.concat(df_movies_genre_frequency_test + [df_movies_genre_frequency[['index', user]]])

df_movies_genre_frequency_test.head(40)

And I’ve also tried with the df.melt(…)

Any help on how to solve this is very much appreciated 🙏

>Solution :

In my opinion pandas.melt() will do the job, while you set the Genre as id_vars=['Genre']:

df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')

Example

df = pd.DataFrame([['Action', 1, 5, 3], 
                   ['Comedy', 2, 4, 6],
                   ['Drama', 3, 2, 10], 
                   ['Crime', 1, 6, 6],
                   ['Documentary', 2, 9, 3]],
                  columns=['Genre', 'Bob', 'Sara', 'Peter'])
df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')

Output

	Genre	Name	Count
0	Action	Bob	1
1	Comedy	Bob	2
2	Drama	Bob	3
3	Crime	Bob	1
4	Documentary	Bob	2
5	Action	Sara	5
6	Comedy	Sara	4
7	Drama	Sara	2
8	Crime	Sara	6
9	Documentary	Sara	9
10	Action	Peter	3
11	Comedy	Peter	6
12	Drama	Peter	10
13	Crime	Peter	6
14	Documentary	Peter	3