Count unique observations in a grouped variable

I have a dataframe that has about 300,000 rows so I’m trying summarise some of it.
It also has about 48 different cols, but for now I’m only interested in 2 of them.
Here’s an example of what the data look like

dput(df)
structure(list(pnum = c("1951bbn", "1951bbn", "1951bbn", "1951bbn", 
"1951bbn", "1951bbn", "1951bbn", "1951ggl", "1951ggl", "1951ggl", 
"1951ggl", "1951ggl", "1951ggl", "1951ggl", "1951ggl", "1952bbn", 
"1952bbn", "1952bbn", "1952bbn", "1952bbn", "1952bbn", "1952bbn", 
"1952bbn", "1952ggl", "1952ggl", "1952ggl", "1952ggl", "1952ggl", 
"1952ggl", "1952ggl", "1952ggl", "1952kkl", "1952kkl", "1952kkl", 
"1952kkl", "1952kkl", "1952kkl", "1952kkl", "1952kkl", "1952kkl", 
"1952llk", "1952llk", "1952llk", "1952llk", "1952llk", "1952llk", 
"1952llk", "1952llk", "1952llk", "1953hhj", "1953hhj", "1953hhj", 
"1953hhj", "1953hhj", "1953hhj", "1953hhj", "1953hhj", "1953hhj", 
"1953ggf", "1953ggf", "1953ggf", "1953ggf", "1953ggf", "1953ggf", 
"1953ggf", "1953ffl", "1953ffl", "1953ffl", "1953ffl", "1953ffl", 
"1953ffl", "1953ffl", "1953ffl"), year = c(1951L, 1951L, 1951L, 
1951L, 1951L, 1951L, 1951L, 1951L, 1951L, 1951L, 1951L, 1951L, 
1951L, 1951L, 1951L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 
1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 
1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 
1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 1952L, 
1952L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 
1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 
1953L, 1953L, 1953L, 1953L, 1953L, 1953L, 1953L)), class = "data.frame", row.names = c(NA, 
-73L))

What I would like to do is simply count the number of unique pnum per year

I’ve been trying variations on

#iinstall.packages("tidyverse")
library(tidyverse)

df %>% 
  group_by(year) %>%
  count(pnum)




A tibble: 9 × 3
# Groups:   year [3]
   year pnum        n
  <int> <chr>   <int>
1  1951 1951bbn     7
2  1951 1951ggl     8
3  1952 1952bbn     8
4  1952 1952ggl     8
5  1952 1952kkl     9
6  1952 1952llk     9
7  1953 1953ffl     8
8  1953 1953ggf     7
9  1953 1953hhj     9

This isn’t what I need as it counts the number of occurences of any pnum per year

If I try use n_distinct

df %>% 
  group_by(year) %>%
  count(n_distinct(pnum))

# A tibble: 3 × 3
# Groups:   year [3]
   year `n_distinct(pnum)`     n
  <int>              <int> <int>
1  1951                  9    15
2  1952                  9    34
3  1953                  9    24

This isn’t what I need either

What I need is only how many different distinct pnums are in each year.

>Solution :

You could count the unique values by grouping on the year and summarise the results with n_distinct like this:

library(dplyr)
df %>%
  group_by(year) %>%
  summarise(n = n_distinct(pnum))
#> # A tibble: 3 × 2
#>    year     n
#>   <int> <int>
#> 1  1951     2
#> 2  1952     4
#> 3  1953     3

^{Created on 2023-08-07 with reprex v2.0.2}

Count unique observations in a grouped variable

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

PL/SQL Cursor returning only last record

Using UNION and GROUP BY WITH ROLLUP for two separate queries

str_replace inside fct_reorder inside mutate(across())

Leave the blank cells in Column A and copy the rest (Google Sheet)

after migrating my code to Laravel 10, I had errors

How to make MS Excel return error if the date is wrong

Keep Up to Date with the Most Important News

Count unique observations in a grouped variable

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

PL/SQL Cursor returning only last record

Using UNION and GROUP BY WITH ROLLUP for two separate queries

str_replace inside fct_reorder inside mutate(across())

Leave the blank cells in Column A and copy the rest (Google Sheet)

after migrating my code to Laravel 10, I had errors

How to make MS Excel return error if the date is wrong

Discover more from Dev solutions