I have a Pandas Dataframe that has a row for each user. Each user took a survey that captured scores from 0 – 5 for a series of survey questions. It looks something like this:
df1 = pd.DataFrame({'User': ['user_a', 'user_b', 'user_c'], 'Cat1_score': [0, 5, 3], 'Cat2_score': [0, 2, 5], 'Cat3_score': [4, 5, 1})
I want to count across the row, not the column and I just can’t wrap my head around it to call the right method(s).
If I use:
df1.count(axis='columns')
That just tells me the number of cells that are not NULL.
This following is closer to what I want, but I have 100 different columns to evaluate for each row, and I don’t want to manually have to type each out.
df1.value_counts('column_name')
What I would really like is to end up with a data frame that looks something like this:
df2 = pd.DataFrame({'User': ['user_a', 'user_b', 'user_c'], 'zero': [2, 0, 0], 'one': [0, 0, 1], 'two': [0, 1, 0], 'three': [0, 0, 1], 'four': [1, 0, 0], 'five': [0, 2, 1]})
I want to count the frequency of how many of the users’ respons = 0, or = 1, or = 5 ect. This might be a case of Friday-afternoon-at-work-lack-of-creative-thinking-brain if the answer is obvious.
>Solution :
Using a crosstab:
s = df1.set_index('User').stack()
out = (pd.crosstab(s.index.get_level_values('User'), s)
.rename_axis(index='User', columns=None).reset_index()
)
Variant:
tmp = df1.melt('User')
out = (pd.crosstab(tmp['User'], tmp['value'])
.rename_axis(columns=None).reset_index()
)
Output:
User 0 1 2 3 4 5
0 user_a 2 0 0 0 1 0
1 user_b 0 0 1 0 0 2
2 user_c 0 1 0 1 0 1