Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check for class of columns in dataframe

I have a toy dataframe df with 2 columns of class integer and factor. However, when I tried to check whether a column is factor, I got incorrect result as follows

num <- c(1:5)
fac <- factor(letters[1:5])
df <- data.frame(num, fac)
df
# num fac
1   1   a
2   2   b
3   3   c
4   4   d
5   5   e
cols <- colnames(df)    
for (col in cols) {
  print(col)
  print(is.factor(df$col))
}
[1] "num"
[1] FALSE
[1] "fac"
[1] FALSE

What I did wrong. How can I check whether a column in a dataframe is factor or binary?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The problem is that df$col refers to a column named "col" in the data frame df. That column doesn’t exist – you’d think that is.factor might return an error in that case, but it returns FALSE instead.

You could refer to the column a different way:

for(col in cols) {
  print(col)
  print(is.factor(df1[, col]))
}

[1] "num"
[1] FALSE
[1] "fac"
[1] TRUE

But most people would use str to summarise column types:

str(df)
'data.frame':   5 obs. of  2 variables:
 $ num: int  1 2 3 4 5
 $ fac: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

Or you could use sapply instead of a loop:

sapply(df, is.factor)
  num   fac 
FALSE  TRUE 

sapply(df1, class, USE.NAMES = TRUE)
      num       fac 
"integer"  "factor"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading