Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Count unique values without specific symbol

I have a dataframe ‘df’ that has categorical and POSIXct columns. The data look like:

Category DateTime
A 2022-08-29 00:00:00
A 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
A 2022-08-29 02:00:00
A 1 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00

I would like to create a new dataframe ‘df2’ that counts each unique value in column ‘Category’ that doesn’t end with " 1" by column ‘DateTime’, so that the results look like:

Category DateTime CatCount
A 2022-08-29 00:00:00 2
B 2022-08-29 00:00:00 3
A 2022-08-29 02:00:00 1
B 2022-08-29 02:00:00 3

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We could filter output the rows having 1, then do a count

library(dplyr)
library(stringr)
df1 %>%
   filter(str_detect(Category, "\\s+1", negate = TRUE)) %>%
   count(Category, DateTime, name = "CatCount")

-output

 Category            DateTime CatCount
1        A 2022-08-29 00:00:00        2
2        A 2022-08-29 02:00:00        1
3        B 2022-08-29 00:00:00        3
4        B 2022-08-29 02:00:00        3

data

df1 <- structure(list(Category = c("A", "A", "A 1", "A 1", "A 1", "B", 
"B", "B", "B 1", "B 1", "B 1", "B 1", "B 1", "A", "A 1", "B", 
"B", "B", "B 1", "B 1", "B 1"), DateTime = c("2022-08-29 00:00:00", 
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00", 
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00", 
"2022-08-29 02:00:00", "2022-08-29 02:00:00")),
class = "data.frame", row.names = c(NA, 
-21L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading