I have a dataframe and I am trying to obtain the shortest pairwise distance of two sample points for distinct categories.
For example, my data:
GridID.1 GridID.2 Distance.m. GridID.1.binned_temp GridID.1.binned_precip GridID.1.binned_devperc GridID.2.binned_temp GridID.2.binned_precip GridID.2.binned_devperc GridID.2.binned_agriperc
1 A13 A15 11478.59 1 5 1 1 5 1 5
2 A13 A17 21079.59 1 5 1 1 5 1 4
3 A13 A8 24837.51 1 5 1 1 2 1 5
4 A13 B15 11313.88 1 5 1 1 5 1 2
5 A13 E9.avg 27312.13 1 5 1 2 3 3 3
6 A13 F10.avg 27973.76 1 5 1 3 3 4 3
7 A13 F12 23552.94 1 5 1 3 4 5 1
8 A13 F16 28017.79 1 5 1 3 4 3 2
9 A15 B15 4318.41 1 5 1 1 5 1 2
10 A15 B9 31370.86 1 5 1 1 3 1 5
11 A15 C20 26994.65 1 5 1 3 4 3 1
12 A15 C21 30116.30 1 5 1 2 5 5 2
13 A15 C7.avg 44382.24 1 5 1 1 2 1 5
I need to filter the dataframe to obtain the shortest distance between two sample points for a category (i.e., GridID.1.binned_temp
).
In that case the output would show the category I filtered for (calling it temp1 to indicate that GridID.1.binned_temp == 1
), and then the GridID.1
site ID, followed by the GridID.2
site ID that is the shortest distance away from GridID.1
site. So something like:
Category GridID.1 GridID.2 Distance
temp1 A13 B15 11313.88
temp1 A15 B15 4318.41
The code I’m using is terrible as I’m not sure how to change this code to filter by a category of my choosing, and then pick out the shortest distances using ids in GridID.1
to GridID.2
.
My code which only gives me the shortest distance I want, and then I manually search for it in the table:
data <- read.csv("testsites.csv", header=TRUE)
df <- data %>% filter(GridID.1.binned_temp == 1) %>% group_by(GridID.1) %>%
summarize(GridID.1 = min(Distance.m.))
The output is just a one column table such as below.
GridID.1
1 11313.88
2 4318.41
>Solution :
You’re looking for the slice
function, or in this particular case you can use slice_min
.
df <- data %>%
filter(GridID.1.binned_temp == 1) %>%
group_by(GridID.1) %>%
slice_min(Distance.m.)
This will give you the complete rows with the minimum distance by group. You can select()
whichever columns you want to keep and add %>% mutate(Category = "temp1")
if you need to add a column named Category
with all values set to the string "temp1"
.
I don’t get what you mean with