Create dummy column and input value from other column

I have data containing a list of topics (topics 1-5; and 0 meaning no topic is assigned) and their value. I want to create a new column for each topic and fill the column with the value. Here’s what the table looks like…

reviewId   topic   value
      01       2      -4
      02       2       9
      03       0      -7
      04       5      -1
      05       1      38

What should I do to create a table looking like this?

reviewId topic value t1 t2 t3 t4 t5
01 2 -4 0 -4 0 0 0
02 2 9 0 9 0 0 0
03 0 -7 0 0 0 0 0
04 5 -1 0 0 0 0 -1
05 1 38 38 0 0 0 0

Here’s reproducible data to work on.

raw_df = pd.DataFrame({
    'reviewId': ['01', '02', '03', '04', '05'],
    'topic': [2, 2, 0, 5, 1],
    'value': ['-4', '9', '-7', '-1', '38']})

Here’s a link to a similar problem with this one.

>Solution :

df.join(pd.get_dummies(df["topic"])
          .reindex(columns=range(1, 5+1), fill_value=0)
          .mul(df["value"], axis="rows")
          .add_prefix("t"))
  • get "1-hot" representation of the "topic" column
  • some topics may be missing, e.g., 3, 4 in the sample data, so reindex to guarantee that
    • and put zeros to those topics values as they were missing
  • now we have a 1/0 dataframe of shape (len(df), len(topics))
    • we multiply this with the "value" column row-wise
    • 1/0’s will act as a "selector" of values
  • add "t" from left to the column names for "t1" … "t5"

and join it with the original frame to get

  reviewId  topic  value  t1  t2  t3  t4  t5
0       01      2     -4   0  -4   0   0   0
1       02      2      9   0   9   0   0   0
2       03      0     -7   0   0   0   0   0
3       04      5     -1   0   0   0   0  -1
4       05      1     38  38   0   0   0   0

Leave a Reply