# Drop rows which are duplicates regarding certain columns

I want to identify and remove observations which are duplicates in certain aspects. In my example, I want to get rid of rows 1 and 6, as they are the same in both V1 and V2. That they differ in V3 shouldn't matter. df <- data.frame(V1 = c("a","b","c","a","c","a"), V2 = c(1,2,1,2,3,1), V3 = c(1,2,3,4,5,6)) Applying…

# Adding 5th and 95th Percentile to List in R

# Adding 5th and 95th Percentile to List in R

# Summarize row values based on other column value

I'll illustrate my question with an example. i have this data: individualIndex testPhase total_correct 1 0 01 7 2 0 02 7 3 0 03 6 4 0 04 5 5 0 05 9 6 0 06 10 7 0 07 5 8 0 08 9 9 0 09 6 10 0 10 9 11…

# Grouping by a factor, return 1 if characters match on any row – R

My dataset is grouped by RunID with a different row for every diagnosis the patient has. I'm trying to create a new variable for if a particular diagnosis (e.g. pneumonia) appears on any row within that RunID. I tried using any but received the error message Caused by warning in `any()`: ! coercing argument of…

# Why to quote in dplyr and why not to quote in dplyr?

I am teaching an intro to R course and a student asked me a question that I cannot answer. The question is, why do we not put id and sex in quotes after select in this example df1 %>% select(id, sex) but we put id in quotes after inner_join in this example df1 %>% inner_join(df2,…

# Efficiently assigning multiple variables created from a subset of grouped data in R

I am trying to improve the efficiency of code that is already working. For instance, consider the toy dataset below: df = data.frame(id = c(1,1,2,2,2,3,3,3), date = c(12,13,1,4,5,9,10,12), visit = c("out","in","in","out","out","out","in","in")) df id date visit 1 12 out 1 13 in 2 1 in 2 4 out 2 5 out 3 9 out 3 10…

# reate a column from another column based on keywords

Based on the data below how can I get add a third Type colummn? The type of hospital will be determined based on certain words in the hospital names. Word Type Government Government Govt Government St Jude Religious Catholic Religious District District Community Community Divine Mercy Religious St. Luke Religious St. Theresa Religious Islamic Religious…

# Add column with repeated numbers per group

I want to add a new column with a 3-number repeat 1,1,1,2,2,2,3,3,3 until the end of each group (Chr) within the data frame. This would be easy if all the groups can be divided by 3, but I am not sure how to do this when the group length is divisible by 2. What happens…

# Filter rows with common entry in two columns

I have the following mock data: df <- data.frame(Col1=c("cat","dog","man","man","cat","cat"), Col2=c("cat","dog","dog","dog","dog","cat")) I want to filter out the rows which have the same name in both columns. In other words, I want to be left with unique names across each row. So in my example, I will be left with an output like: Col1 Col2 man dog…