Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R count distinct character of days ( n_distinct, nlevels(as.factor()) str_count() are not working)

> test
# A tibble: 30 × 2
# Groups:   Week [30]
    Week Dates                                                                                                                 
   <dbl> <chr>                                                                                                                 
 1     2 2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08                        
 2     3 2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15                                                            
 3     4 2023-10-18, 2023-10-19, 2023-10-20, 2023-10-20, 2023-10-21, 2023-10-21, 2023-10-22, 2023-10-22                        
 4     5 2023-10-25, 2023-10-25, 2023-10-26, 2023-10-27, 2023-10-28, 2023-10-29, 2023-10-29, 2023-10-30                        
 5     6 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-02, 2023-11-02, 2023-11-03, 2023-11-04, 2023-11-05, 2023-11-05
 6     7 2023-11-09, 2023-11-10, 2023-11-13                                                                                    
 7     8 2023-11-16, 2023-11-17, 2023-11-18, 2023-11-19, 2023-11-21                                                            
 8     9 2023-11-22, 2023-11-22, 2023-11-23                                                                                    
 9    10 2023-11-29, 2023-11-30, 2023-12-02, 2023-12-03, 2023-12-04                                                            
10    11 2023-12-06, 2023-12-07, 2023-12-08, 2023-12-08, 2023-12-09, 2023-12-10, 2023-12-10                                    
# ℹ 20 more rows

Dated are pasted with comma then it’s saved as characters in data set of ‘test’
I need to count the unique date of each week.
For example, the number of counted dates for week2 should be 4: 2023-10-04,2023-10-05,2023-10-06, 2023-10-08 and the number of counted dates for week3 should be 4: 2023-10-11,2023-10-12,2023-10-14, 2023-10-15 so and so forth.

but I tried with

> with(test, tapply(Dates, Week, function(x) nlevels(unique(as.factor(x)))))
 2  3  4  5  6  7  8  9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
> with(test, sapply(Dates, function(x) nlevels(unique(as.factor(x)))))
                                    2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08 
                                                                                                                                 1 
                                                                        2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15 
                                                                                                                                 1 
                                    2023-10-18, 2023-10-19, 2023-10-20, 2023-10-20, 2023-10-21, 2023-10-21, 2023-10-22, 2023-10-22 
                                                                                                                                 1 
                                    2023-10-25, 2023-10-25, 2023-10-26, 2023-10-27, 2023-10-28, 2023-10-29, 2023-10-29, 2023-10-30 
                                                                                                                                 1 
            2023-11-01, 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-02, 2023-11-02, 2023-11-03, 2023-11-04, 2023-11-05, 2023-11-05 
                                                                                                                                 1 
                                                                                                2023-11-09, 2023-11-10, 2023-11-13 
                                                                                                                                 1 
> n_distinct(unique(as.factor(test$Dates[1])))
[1] 1

it all recognize as one chunk.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> unique(factor(str_split(test$Dates[1], ',')))
[1] c("2023-10-04", " 2023-10-05", " 2023-10-05", " 2023-10-06", " 2023-10-06", " 2023-10-06", " 2023-10-08", " 2023-10-08")
Levels: c("2023-10-04", " 2023-10-05", " 2023-10-05", " 2023-10-06", " 2023-10-06", " 2023-10-06", " 2023-10-08", " 2023-10-08")
> unique(str_split(test$Dates[1], ','))
[[1]]
[1] "2023-10-04"  " 2023-10-05" " 2023-10-05" " 2023-10-06" " 2023-10-06" " 2023-10-06" " 2023-10-08" " 2023-10-08"

> nlevels(factor(str_split(test$Dates[1], ',')))
[1] 1

nor string split can’t recognize as distinct(unique) counts

>Solution :

Example data:

x <- c(
    "2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08",
    "2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15"
)

Count e.g. like this:

x |> strsplit(', ') |> sapply(\(x) length(unique(x)))

Or using tidyverse:

x |> str_split(', ') |> map_int(n_distinct)

Both give

[1] 4 4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading