Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new variable based on two variables in my dataset in r

I would like to create a column in my dataset which is the subtraction of positive and negative sentiment from my total column.

So for user Alex, who has a positive sentiment sum of 80 and a negative sentiment sum of 13, the subtracted score will be 67.

The issue I am having is grouping the sentiment column in a way which allows me to preform this operation.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(tidyverse)

# create mock dataframe
users <- c("Alex", "Alice", "Alexandra", "Andrew", "Alicia", "Alex", "Alice", "Alexandra", "Andrew", "Alicia")
sentiment <- c("positive", "negative", "positive","negative", "positive", "negative", "positive", "negative","positive", "negative")
total <- c(80, 70, 24, 74, 66, 13, 35, 94, 27, 94)

mockdataframe <- cbind(users,sentiment, total) %>% as_tibble()
mockdataframe$sentiment <- as.factor(mockdataframe$sentiment)
mockdataframe$total <- as.numeric(mockdataframe$total)

# using case_when() this way does not work
mockdataframe %>% 
  mutate(Subtraction = case_when(
    sentiment == "positive" ~ (sentiment == "negative")/mockdataframe$total))

I am really struggling trying to solve this. Any help would be appreciated.

>Solution :

Using tidyr::pivot_wider you could do:

library(tidyverse)

mockdataframe %>% 
  pivot_wider(names_from = sentiment, values_from = total) %>%
  mutate(Subtraction = positive - negative)
#> # A tibble: 5 × 4
#>   users     positive negative Subtraction
#>   <chr>        <dbl>    <dbl>       <dbl>
#> 1 Alex            80       13          67
#> 2 Alice           35       70         -35
#> 3 Alexandra       24       94         -70
#> 4 Andrew          27       74         -47
#> 5 Alicia          66       94         -28

Or using group_by:

mockdataframe %>% 
  group_by(users) %>%
  mutate(Subtraction = total[sentiment == "positive"] - total[sentiment == "negative"]) |> 
  ungroup()
#> # A tibble: 10 × 4
#>    users     sentiment total Subtraction
#>    <chr>     <fct>     <dbl>       <dbl>
#>  1 Alex      positive     80          67
#>  2 Alice     negative     70         -35
#>  3 Alexandra positive     24         -70
#>  4 Andrew    negative     74         -47
#>  5 Alicia    positive     66         -28
#>  6 Alex      negative     13          67
#>  7 Alice     positive     35         -35
#>  8 Alexandra negative     94         -70
#>  9 Andrew    positive     27         -47
#> 10 Alicia    negative     94         -28
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading