I have a csv file like these: this csv filled is called df_plane in R
| Situation | flight_uses | People-ID |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 1 |
| 1 | 1 | 2 |
| 2 | 1 | 2 |
| 3 | 1 | 2 |
| 1 | 1 | 3 |
| 2 | 0 | 3 |
| 3 | 1 | 3 |
| 1 | 1 | 4 |
| 2 | 1 | 4 |
| 3 | 0 | 4 |
| 1 | 1 | 5 |
| 2 | 0 | 5 |
| 3 | 0 | 5 |
| 1 | 1 | 6 |
| 2 | 1 | 6 |
| 3 | NA | 6 |
| 1 | NA | 7 |
| 2 | 1 | 7 |
| 3 | 1 | 7 |
| 1 | 1 | 8 |
| 2 | 0 | 8 |
| 3 | 0 | 8 |
| 1 | NA | 9 |
| 2 | NA | 9 |
| 3 | 1 | 9 |
| 1 | 1 | 10 |
| 2 | 1 | 10 |
| 3 | 0 | 10 |
| 1 | 0 | 11 |
| 2 | 0 | 11 |
| 3 | 0 | 11 |
I would like to find out what percentage of people uses airplane in situation 2. I would like to know if there is a more efficient way than use the code below. Because with the below code I have to calculate it manually.
table(select(df_plane,situation,flight_uses))
>Solution :
Are you asking, of those rows where Situation==2, what is the percent where flight_uses==1?
dplyr approach
dplyr is useful for these types of manipulations:
library(dplyr)
df_plane |>
filter(Situation == 2) |>
summarise(
percent_using_plane = sum(flight_uses==1, na.rm=T) / n() * 100
)
# percent_using_plane
# 1 54.54545
base R
If you want to stick with the base R table syntax (which seems fine in this case but can become unwieldy once calculations get more complicated), you were nearly there:
table(df_plane[df_plane$Situation==2,]$flight_uses) / nrow(df_plane[df_plane$Situation==2,])*100
# 0 1
# 36.36364 54.54545