Creating dummy variables as counts using tidyverse/dplyr

July 2, 2022

Let’s say I have some data as follows:

ID    FRUIT
001   apple
002   grape
001  banana
002   apple
003   apple
001   apple

I would like to make columns out of this, like dummy variables. Except the dummies are counts of the variable in the FRUIT column. So, if ID 001 has apple appear 2 two times in the FRUIT column, the new column apple or FRUIT_apple is 2.

Expected output:

ID   FRUIT_apple  FRUIT_grape  FRUIT_banana
001            2            0             1
002            1            1             0
003            1            0             0

Not attached to these column names, whatever is easier.

>Solution :

using reshape2 but you could pretty much use any package that lets you reformat from long to wide

    library(reshape2)
    df = dcast(fruitData,ID~FRUIT,length)
   
    > df
    ID apple banana grape
  1  1     2      1     0
  2  2     1      0     1
  3  3     1      0     0