Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using aggregate/group_by in R to group data and give a count for each factor variable?

I have a dataframe that looks like this. I am showing the first 6 rows out of simplicity but the total number of rows is 8236. The grades range from 0-2. I have just shown grade 0 and 1 in the example below:

 Telangiectasia_time      grade
  <chr>                    <int>
1 telangiectasia_tumour_0      0
2 telangiectasia_tumour_1      0
3 telangiectasia_tumour_12     0
4 telangiectasia_tumour_24     0
5 telangiectasia_tumour_0      1
6 telangiectasia_tumour_1      1

I want to group by Telangiectasia_Time(first column) and then count the number of grades for each group. Therefore, for the first 6 rows as an example, it should look like this:

       Telangiectasia_time grade0    grade1    grade2 
1  telangiectasia_tumour_0    1      1          0
2  telangiectasia_tumour_1    1      1          0
3 telangiectasia_tumour_12    1      0          0
4 telangiectasia_tumour_24    1      0          0  

There are three columns at the end for the respective grades and a count of each grade for each variable. I tried to use the aggregate function:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)** 

but I am not sure what to put in the last bit in the brackets so that the sum of each grade is returned. When I put sum, it simply adds the numbers and does not treat the variables as seperate (0,1 and 2). Using my full dataset I get the output which is wrong:

      Telangiectasia_time grade
1  telangiectasia_tumour_0    18
2  telangiectasia_tumour_1    11
3 telangiectasia_tumour_12    38
4 telangiectasia_tumour_24    87

I have also tried group_by() but this just gives me a total count

telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
  Telangiectasia_time      count
* <chr>                    <int>
1 telangiectasia_tumour_0   2059
2 telangiectasia_tumour_1   2059
3 telangiectasia_tumour_12  2059
4 telangiectasia_tumour_24  2059

>Solution :

With dpylr::count and tidyr::pivot_wider you could do:

library(dplyr)
library(tidyr)

telangiectasia_tumour_data %>% 
  count(Telangiectasia_time, grade) %>% 
  pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#>   Telangiectasia_time      grade0 grade1
#>   <chr>                     <int>  <int>
#> 1 telangiectasia_tumour_0       1      1
#> 2 telangiectasia_tumour_1       1      1
#> 3 telangiectasia_tumour_12      1      0
#> 4 telangiectasia_tumour_24      1      0

DATA

telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
  "telangiectasia_tumour_0",
  "telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
  "telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
  0L,
  0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
  "1",
  "2", "3", "4", "5", "6"
))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading