Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculation of a ratio of columns to create a model

I have a set of data where i am trying to model the rate of TB cases per unit population. Am I correct in thinking to find the rate of TB per unit of the population is as simple as doing;

rate <- tbData$TB/tbData$Population

My df is called tbData with the following variables;

head(TBdata)
  Indigenous Illiteracy Urbanisation Density Poverty Poor_Sanitation Unemployment Timeliness  Year    TB Population Region   lon    lat    
1      0.335       6.35         84.1   0.714    31.3            15.3         5.41       59.2  2012   323     559543  11001 -60.7 -12.1  0.000577
2      6.45        8.49         71.4   0.743    48.6            29.4         5.92       58.1  2012    15      73193  11002 -64.0  -9.43

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Apparently yes! R is vectorized, which means you can easily do vector arithmetic.

In many programming languages we need a for loop for this kind of calculation,

r <- numeric(length(nrow(TBdata)))
for (i in seq_len(nrow(TBdata))) {
  r[i] <- TBdata[i, 'TB'] / TBdata[i, 'Population']
}
r
# [1]   6.229102 134.133333

whereas in R we simply do—

TBdata$TB/TBdata$Population
# [1]   6.229102 134.133333

This isn’t magic of course, imagine it being passed to a C implementation under the hood that is a for loop at the very end, but in R it would be very slow.


Data:

TBdata <- structure(list(Indigenous = 1:2, Illiteracy = c(0.335, 6.45), 
    Urbanisation = c(6.35, 8.49), Density = c(84.1, 71.4), Poverty = c(0.714, 
    0.743), Poor_Sanitation = c(31.3, 48.6), Unemployment = c(15.3, 
    29.4), Timeliness = c(5.41, 5.92), Year = c(59.2, 58.1), 
    TB = c(2012L, 2012L), Population = c(323L, 15L), Region = c(559543L, 
    73193L), lon = 11001:11002, lat = c(-60.7, -64), foo = c(-12.1, 
    -9.43)), class = "data.frame", row.names = c(NA, -2L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading