Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find n-1 closest values based on criteria in a dataframe in R

I have a df with data from a qPCR run:

df_1 <- structure(list(
  row = c("A", "A", "A", "A", "B", "B"), 
  column = c(17L, 18L, 19L, 20L, 17L, 18L), 
  Treatment = c("Clp-1", "Clp-1","Clp-1", "Clp-1", "Clp-1", "Clp-1"), 
  Time = c("1h", "1h", "1h", "1h", "1h", "1h"), 
  Sample_Nr = c("1.1", "1.1", "1.1", "1.1", "1.2", "1.2"), 
  Target_Name = c("ClP-1", "ClP-1", "ClP-1", "ClP-1", "ClP-1", "ClP-1"), 
  Task = c("UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN","UNKNOWN"), 
  Reporter = c("SYBR", "SYBR", "SYBR", "SYBR", "SYBR", "SYBR"), 
  CT = c(30.7594337463379, 29.7701301574707,31.2958374023438, 
         29.883508682251, 28.765043258667, 28.3563442230225)), 
  row.names = c(NA, 6L), class = "data.frame")

This is an example from the df

I’m trying to find the n-1 closest Ct values based on the criteria "Sample_Nr" & "Target_Name" to calculate their average for downstream analysis.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I found this solution online so far:

n = 4
df_1 <- df %>% group_by(Sample_Nr,Target_Name, Treatment, Time) %>% 
count("CT") %>% do(data.frame(findClosest(.$CT,n)))

Based on:
https://www.faqcode4u.com/faq/49619/how-to-find-the-three-closest-nearest-values-within-a-vector

My Problem now is that "n" is a fixed value but sometimes I have just three Ct values instead of four of each technical replicate (The missing one will be a "NA" in the df). In such a case the findClosest() function can’t be applied to the df as the n by default would be 4. (Usually four technical replicates per condition).

How can I still use this function but adjusted to the number of Ct values I have for each condition?

So far I’ve tried the following:

a = df %>% group_by(Sample_Nr,Target_Name, Treatment, Time) %>% filter(!is.na(CT)) 
Vector_df1<−c(table(a$Sample_Nr, a$Target_Name))

I tried to pass "Vector_df1" as my new "n" to findClosest() but this doesn’t work.

Error message:

There were 50 or more warnings (Show first 50 warnings using warnings())

Warning:
1: Unknown or uninitialised column: CT.
2: In 0:(n – 1) : numeric expression has 81 elements: only first one is used.

49: Unknown or uninitialised column: CT.
50: In 0:(n – 1) : numeric expression has 81 elements: only first one is used.

PS:
I apologize if this post is too long or anything. I tried to be precise and include all relevant information. It’s also my first post.

>Solution :

Here is a way. Change function findClosest to check whether the vector length is not less than n.

suppressPackageStartupMessages({
  library(dplyr)
})

findClosest <- function(vec, n) {
  require(zoo)
  if(n > length(vec)) n <- length(vec)
  vec1 <- sort(vec)
  m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
  return(m1[which.min(m1[, 1]),][-1]) 
}

n <- 4
df_1 %>%
  group_by(Sample_Nr, Target_Name) %>%
  summarise(Closest = findClosest(CT, n), .groups = "drop")
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> # A tibble: 6 × 3
#>   Sample_Nr Target_Name Closest
#>   <chr>     <chr>         <dbl>
#> 1 1.1       ClP-1          29.8
#> 2 1.1       ClP-1          29.9
#> 3 1.1       ClP-1          30.8
#> 4 1.1       ClP-1          31.3
#> 5 1.2       ClP-1          28.4
#> 6 1.2       ClP-1          28.8

Created on 2022-08-12 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading