I have a two large dataframes (around 19000 rows and 71 columns) as follows
df1
| sample1 | sample2 | sample3 | |
|---|---|---|---|
| gene1 | 5 | 10 | 15 |
| gene2 | 2 | 8 | 10 |
| gene3 | 3 | 9 | 10 |
df2
| sample1 | sample2 | sample3 | |
|---|---|---|---|
| gene1 | 40 | 50 | 65 |
| gene2 | 12 | 18 | 0 |
| gene3 | 31 | 19 | 10 |
I am trying to perform wilcoxon rank sum test on the rows with the same index but the code is taking forever on google colab!!
My code so far
wilc_results= c()
for( x in 1:nrow(df1)){
for (y in 1:nrow(df2)){
result= wilcox.test(as.numeric(df2[y,]), as.numeric(f1d[x,]),
alternative= 'two.sided', paired= T )
wilc_results[length(wilc_results) + 1] <- result$p.value
}
}
is there a much faster way to get the desired output?
>Solution :
There is no need to loop twice, since both your data frames have the same number of columns. It runs in about 10 seconds on a similarly sized dataset on my computer.
wilc_results <- list()
for(i in 1:nrow(df1)) {
result <- wilcox.test(as.numeric(df1[i,]), as.numeric(df2[i,]),
alternative='two.sided', paired=T)
wilc_results[[i]] <- result$p.value
}